Open clairedavies opened 1 year ago
Molecular taxonomy string from Jodie to help find the named taxa
SILVA v138 Bacteria; Cyanobacteria; Cyanobacteriia; Cyanobacteriales; Phormidiaceae; Trichodesmium
SILVA v138 Eukaryota; SAR; Alveolata; Dinoflagellata; Noctilucales; Noctiluca Eukaryota; SAR; Stramenopiles; Ochrophyta; Raphidophyceae; Chattonellales; Chattonella Eukaryota; SAR; Stramenopiles; Ochrophyta; Raphidophyceae; Chattonellales; Heterosigma Eukaryota; SAR; Stramenopiles; Ochrophyta; Dictyochophyceae; Florenciellales; Pseudochattonella
PR2 Eukaryota; TSAR; Alveolata; Dinoflagellata; Noctilucophyceae; Noctilucales; Noctilucaceae; Noctiluca OR Noctilucales_X Eukaryota; TSAR; Stramenopiles; Gyrista; Raphidophyceae; Raphidophyceae _X; Raphidophyceae _XX; Chattonella Eukaryota; TSAR; Stramenopiles; Gyrista; Raphidophyceae; Raphidophyceae _X; Raphidophyceae _XX; Heterosigma Eukaryota; TSAR; Stramenopiles; Gyrista; Dictyochophyceae; Dictyochophyceae_X; Florenciellales; Pseudochattonella
A test file at data/NRS_taxon_abundance.csv
has been generated in branch 2.1.0
. The test file was generated using the modified script SSI_1c_make_atlas_input.py
and a new file NRS_taxon_list.csv
are included in the same branch.
Input data includes:
Silva138
taxonomy classified by QIIME2 SKlearn
. searchfield = 'imos_site_code'
searchTerm = 'NRS%'
(wild card search, all nrs sites have the format NRS***)returnfields = ['sample_id','depth', 'nrs_trip_code', 'nrs_sample_code','sample_integrity_warnings']
Samples containing sample integrity warnings were removed from the analysis.
The file NRS_taxon_list.csv
holds a list of taxa to be retrieved, with one taxa per line, using Silva138 taxonomy. Each taxonomic level is comma separated and formatted with an additional taxonomic level prefix (e.g., d__<name>,p__<name>,c__<name>,o__<name>,f__<name>,g__<name>,s__<name>)
as per current AM portal formatting (https://data.bioplatforms.com/bpa/otu/). As this file is read by SSI_1c_make_atlas_input.py
, taxa contained in it will be included in the analysis making it easy to add taxa of interest. The script should recognise and select the appropriate amplicon for Bacteria
, Eukaryota
and Archaea
.
Output file column format is:
sample_id,depth,nrs_trip_code,nrs_sample_code,amplicon,g__Chattonella_abundance_20K,g__Heterosigma_abundance_20K,g__Pseudochattonella_abundance_20K,g__Trichodesmium_abundance_20K,s__Noctiluca_scintillans_abundance_20K
Feedback welcomed
Hi, Thanks again for doing this, I just had a quick look at the data. It looks fine but could I possibly request that the Trichodesmium data be at the species level instead of at the genus level.
So ideally there would be a column for T. erythraeum, T. thiebautii and a T. spp. for those that aren’t given a species name. There may be other species as well but I would definitely expect these two to be there.
We are interested in the distributions of the individual species, this is something that we can’t determine with light microscopy, so a real advantage of this data.
Thanks in advance
Please could you: 1) expand this dataset to include all stations (this is especially important for looking at Tricho in the GBR) 2) modify the data so that abundance table is based on ASV sequences for ALL reads, this way I can get a better estimation of the proportional representation of the taxa in the sample
Down the track, probably just for the Tricho project I would be looking for: 1) A data table of all the ASVs within the genus Trichodesmium From this I can work out the proportion of nifH genes that are Tricho, and we may then want to look at those that aren't ......
Thanks again. Your efforts are appreciated. Jodie is across the details if you need some clarification. I will be on leave until 20th June so no rush.
ASV based table is presented in branch 2.1.0_ASV. I included total abundance and unique ASV counts for all ASV and for ASVs subsampled to 20K reads. This is extended to the trait numbers too.
Some samples do not have ASV/taxonomy info this happens when a trait is present but no selected taxonomy is found in the sample. These can be excluded but I thought it might be interesting info down the track.
I have given Jodie the rundown of the sheet so she should be able to answer any questions.
Thanks for the ASV table. Please can you add all stations to the NRS_taxon_abundance.csv table. At the moment you have the searchTerm = 'NRS%', please can you drop this filter, Thanks
Ooppss had a better look at the ASV table and can see what you have done now. I'll have a play with this, it looks like it has what we want in it.
Please can you add Trichosphaerium (genus) to the ASV table. Jodie will work out the labelling in your db based on the taxonomy.
Please add a data table to the repo that includes the abundance, count, read data for Trichodesmium (any species), Noctiluca scintillans & any Chatonella, Heterosigma and Pseudochattonella species (fish killing habs). Include only data from the NRS stations, I will need the NRS trip code and a sample depth to match to metadata.
The plan is to plot a timeseries of 'relative abundance' from each taxa, with a seasonal climatology. Maybe plot a map of relative abundances.
If the data is not rarefied I will also need the total no of reads per sample, along with the reads of each taxa?
Happy to chat more if this isn't clear, not always sure of your terminology.
Thank you