GLOMICON / asvBiomXchange

A repository to develop an exchange format for molecular biodiversity data
1 stars 4 forks source link

Australian Microbiome Initiative #6

Open jodievandekamp opened 5 years ago

jodievandekamp commented 5 years ago

Fantastic opportunity and we're very keen to contribute! I'll set up a folder for the Australian Microbiome Initiative and provide links to all of our current ASV workflows (16S, 18S, fungal ITS), along with our data portal.

pbuttigieg commented 4 years ago

Great @jodievandekamp ! Looking forward to it.

We'd be keen on having a few BIOM files with some exchangeable data to test the approach. In the AWI folder, (see #7) we'll also be including some of our dada2 code to build the ASV tables. I'd recommend adding a copy of your workflow there, so we can create a consensus approach.

If you have pre-processing code (primer clipping, quality trimming, merging, etc), that would be good to include too.

pbuttigieg commented 4 years ago

Hi @jodievandekamp please see #11 - dockerised workflows are likely to be the order of the day for exchangeability between observatories.

raissameyer commented 4 years ago

Hi @jodievandekamp ,

Are you still keen to contribute?

I'm working with Pier on setting up a prototype for the ASV data exchange and I'm looking for a few datasets from GLOMICON members to work with.

@cuttlefishh has kindly shared some data from the EMP already, and I'm keen to include some from Australia 😃

I have a few months to focus on this before I have to start writing up.

All I'd need to start is access to the .fastq files (in case the data is already public the INSDC accession numbers, if not some other way to access them) as well as the metadata and taxonomy, if you have any.

Of course I'll post the results and my code to this repo once it is tested. Hope we can work together!

jodievandekamp commented 4 years ago

Hi @raissameyerhttps://github.com/raissameyer

Definitely keen to contribute and help where we can! I originally provided some biom files which I see have been merged now. Is there anything else you need from me?

Cheers

Jodie van de Kamp Research Scientist | CSIRO jodie.vandekamp@csiro.aumailto:jodie.vandekamp@csiro.au | +61 3 6232 5331 |

From: Raissa Meyer notifications@github.com Sent: Saturday, 9 November 2019 12:24 AM To: GLOMICON/asvBiomXchange asvBiomXchange@noreply.github.com Cc: Van De Kamp, Jodie (O&A, Hobart) Jodie.Vandekamp@csiro.au; Mention mention@noreply.github.com Subject: Re: [GLOMICON/asvBiomXchange] Australian Microbiome Initiative (#6)

Hi @jodievandekamphttps://github.com/jodievandekamp ,

Are you still keen to contribute?

I'm working with Pier on setting up a prototype for the ASV data exchange and I'm looking for a few datasets from GLOMICON members to work with.

@cuttlefishhhttps://github.com/cuttlefishh has kindly shared some data from the EMP already, and I'm keen to include some from Australia 😃

I have a few months to focus on this before I have to start writing up.

All I'd need to start is access to the .fastq files (in case the data is already public the INSDC accession numbers, if not some other way to access them) as well as the metadata and taxonomy, if you have any.

Of course I'll post the results and my code to this repo once it is tested. Hope we can work together!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/GLOMICON/asvBiomXchange/issues/6?email_source=notifications&email_token=AKT6GW5OE2UN4XVA7RGHKCLQSVR7DA5CNFSM4H5MEBG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDRZRSA#issuecomment-551786696, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKT6GWYRT36LEXARSERJK5TQSVR7DANCNFSM4H5MEBGQ.

raissameyer commented 4 years ago

Hi @jodievandekamp ,

Thank you for the .biom files and the accompanying README you so kindly shared!

Do you happen to know whether the associated raw data is publically available and if so where to find the INSDC accession numbers?

We'd like to reproduce a few OTU tables using different in silico technologies to get a sense of variability from the amplicon analysis methods across observatories.

jodievandekamp commented 4 years ago

Hi @raissameyer,

No problem and yes the data are publically available. There are two ways to access:

  1. Through the Aus Microbiome data portal: https://data.bioplatforms.com/organization/75a9f0a5-fc60-4af6-b455-32e2011d969b?q=MAI+amplicons&sort=score+desc%2C+metadata_modified+desc

In the search bar write MAI amplicons and it will bring up all 1036 fastq files (bac, arc and euks). You can then download a zip file with a readme on downloading the raw files.

  1. NCBI

In the attached metadata file there is a column with the NCBI Biosample IDs.

Let me know if you need anything further.

Cheers, Jodie portal_fastq_search

mm-genomics-amplicon_IMOS_NRSMAI_metadata.txt

raissameyer commented 4 years ago

Hi @jodievandekamp

Thanks for guiding me to the raw data, I am keen to start working with it!

Cheers, Raïssa

jodievandekamp commented 4 years ago

Fantastic @raissameyer Let me know if you need anything further.

raissameyer commented 4 years ago

Hi @jodievandekamp

While having a closer look at the 16S amplicon data I realised that multiple run accession numbers belong to a single sample accession number referenced in the metadata table. Do you know what the differences between the runs were? Which forward-reverse FASTQ pair would you suggest using per sample?

Cheers, Raïssa

jodievandekamp commented 4 years ago

Hi @raissameyer, Did you just download the Maria Island (MAI) data? If so, yes there would be 12 samples (e.g. 21644 -21655, surface depth across a year period) with multiple run accessions. The reason there were 2 runs is that it was when we switched from 454 to MiSeq, we piloted the 12 samples on a single MiSeq run to see what the data was like. Moving forward we sent in our entire samples set for MiSeq and rather than cherry picking the 12 samples out of the plates, they were just sequenced again. For consistency sake, I would use the AHGA0 (bac 16S) or AHYFU (arc 16S) runs as these were full runs of samples. Let me know if this isn't the double up you were talking about. Hope the project is going well! Cheers, Jodie

raissameyer commented 3 years ago

Hi @jodievandekamp

Sorry for the silence here. I’ve just started a three year project, so I’ll soon pick this up. Yes, that was the duplication I was referring to - thanks a lot for clarifying.