Closed Smithmania closed 1 year ago
Users need to be able to access metagenome information for sites.
Similar interface to existing bpaotu web app, but instead of the sites list below the search buttons linking to a download page (e.g. https://data.bioplatforms.com//organization/australian-microbiome?q=sample_id:102.100.100/138359 ) the site id links/buttons need to open up a metgenome panel, much like https://data.microbiomedata.org/ does. See example table below. Could either expand the site label when clicked or open up a modal dialog.
Sites with metagenome information will be present in the taxonomy files with an OTU strings starting with mxa_
, so this interface doesn't need to see all sites, just those with a corresponding otu code of mxa_
something.
Could either have some kind of toggle switch on the UI to switch between regular and metagenome mode, or maybe just use a URL query string to select metagenome mode, and then link to that from some landing page.
Download URLs could probably be formulated with a simple pathname convention incorporating the site id e.g. https://download.example.com/something/$site_id/$something_else/something.xxx
Data Object Type | Data Object Description | File Size | Downloads | Download | |
---|---|---|---|---|---|
Workflow Activity: Read QC Activity for nmdc:mga0khk038 | |||||
Filtered Sequencing Reads | Reads QC result fastq (clean data) | 9.1 GiB | 0 | ||
Workflow Activity: Assembly Activity for nmdc:mga0khk038 | |||||
Assembly Coverage BAM | Sorted bam file of reads mapping back to the final assembly | 10.8 GiB | 0 | ||
Assembly Scaffolds | Final assembly scaffolds fasta | 1.5 GiB | 0 | ||
Assembly Contigs | Final assembly contigs fasta | 1.5 GiB | 0 | ||
Workflow Activity: Annotation Activity for nmdc:mga0khk038 | |||||
Annotation KEGG Orthology | Tab delimited file for KO annotation | 42.0 MiB | 0 | ||
Structural Annotation GFF | GFF3 format file with structural annotations | 204.0 MiB | 0 | ||
Annotation Enzyme Commission | Tab delimited file for EC annotation | 27.3 MiB | 0 | ||
Functional Annotation GFF | GFF3 format file with functional annotations | 371.6 MiB | 0 | ||
Annotation Amino Acid FASTA | FASTA amino acid file for annotated proteins | 425.2 MiB | 0 | ||
Workflow Activity: MAGs Analysis Activity for nmdc:mga0khk038 | |||||
CheckM Statistics | CheckM statistics report | 765 B | 0 |
Data is provided by files named with sample id
hou098@terrible-hf:/mnt/data/work/amd/Metagenome_QC_reads$ pwd
/mnt/data/work/amd/Metagenome_QC_reads
hou098@terrible-hf:/mnt/data/work/amd/Metagenome_QC_reads$
hou098@terrible-hf:/mnt/data/work/amd/Metagenome_QC_reads$
hou098@terrible-hf:/mnt/data/work/amd/Metagenome_QC_reads$ ls|head -25
10714_HFLF3BCXX-1_merged.fastq.gz
10714_HFLF3BCXX-1_R1p.fastq.gz
10714_HFLF3BCXX-1_R1R2u.fastq.gz
10714_HFLF3BCXX-1_R2p.fastq.gz
10714.md5
10716_HFLF3BCXX-1_merged.fastq.gz
10716_HFLF3BCXX-1_R1p.fastq.gz
10716_HFLF3BCXX-1_R1R2u.fastq.gz
10716_HFLF3BCXX-1_R2p.fastq.gz
10716.md5
10718_HFLF3BCXX-2_merged.fastq.gz
10718_HFLF3BCXX-2_R1p.fastq.gz
10718_HFLF3BCXX-2_R1R2u.fastq.gz
10718_HFLF3BCXX-2_R2p.fastq.gz
10718.md5
10720_HFLF3BCXX-2_merged.fastq.gz
10720_HFLF3BCXX-2_R1p.fastq.gz
10720_HFLF3BCXX-2_R1R2u.fastq.gz
10720_HFLF3BCXX-2_R2p.fastq.gz
10720.md5
12424_combined_merged.fastq.gz
12424_combined_R1p.fastq.gz
12424_combined_R1R2u.fastq.gz
12424_combined_R2p.fastq.gz
12424.md5
File descriptions and paths to example data for the bpa-otu metagenome enhancements from @Smithmania
File naming convention needs a bit more thought. We probably want everything to be of the form $sampleid-*
Data object type | Data object description | Data object methodology | Data object example file |
---|---|---|---|
Filtered sequencing reads - sampleID_*_R1p.fastq.gz | Quality filtered R1 paired reads | BBtools QC protocol | /datasets/work/oa-amd/work/amd/Metagenome_QC_reads/21644_combined_R1p.fastq.gz |
Filtered sequencing reads - sampleID_*_R2p.fastq.gz | Quality filtered R2 paired reads | BBtools QC protocol | /datasets/work/oa-amd/work/amd/Metagenome_QC_reads/21644_combined_R2p.fastq.gz |
Filtered sequencing reads - sampleID_*_merged.fastq.gz | Quality filtered merged reads | BBtools QC protocol | /datasets/work/oa-amd/work/amd/Metagenome_QC_reads/21644_combined_merged.fastq.gz |
Filtered sequencing reads - sampleID_*_R1R2u.fastq.gz | Quality filtered unpaired reads | BBtools QC protocol | /datasets/work/oa-amd/work/amd/Metagenome_QC_reads/21644_combined_R1R2u.fastq.gz |
checksum - sampleID.md5 | md5 sum of above files | /datasets/work/oa-amd/work/amd/Metagenome_QC_reads/21644.md5 | |
Worflow activity: Assembly activity | |||
Assembly - 01.sampleID.fasta | Fasta file containing the contigs from the assembly | Squeezemets full workflow - input R1, R2 | /datasets/work/oa-env-gen/work/Smith/Hadza/results/01.Hadza.fasta |
Assembly statistics - 01.sampleID.lon | Length of the contigs | Squeezemets full workflow - input R1, R2 | /datasets/work/oa-env-gen/work/Smith/Hadza/results/intermediate/01.Hadza.lon |
Assembly statistics - 01.sampleID.stats | Assembly statistics (N50, N90, number of reads, etc) | Squeezemets full workflow - input R1, R2 | /datasets/work/oa-env-gen/work/Smith/Hadza/results/intermediate/01.Hadza.stats |
BINNING There are a number of options and outputs here, bins are calculated using metabat2 and maxbin and combined using DAStool for an example see directories at : /datasets/work/oa-env-gen/work/Smith/Hadza/results/ - Perhaps we can supply all binned fasta files or supply fasta files (only including one example here) associated with DAStool merged bins and include the final summary (19.sampleID.bintable) table as below | |||
Assembly - maxbin.002.fasta.contigs.fa | Fasta file containig binned metagenomic reads | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/DAS/Hadza_DASTool_bins/maxbin.002.fasta.contigs.fa |
Annotation - 19.sampleID.bintable | Compilation of all data for bins | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/19.Hadza.bintable |
Worflow activity: Annotation activity | |||
Annotation - sampleID_sqm_reads.out.allreads | Taxonomic and functional assignments for each read | Squeezemeta reads - input: R1p,merged,R1R2u | /datasets/work/oa-amd/work/amd-work/SQM_READS/21644/21644_sqm_reads.out.allreads |
Annotation - sampleID_sqm_reads.out.allreads.funcog | Abundance of all COG functions | Squeezemeta reads - input: R1p,merged,R1R2u | /datasets/work/oa-amd/work/amd-work/SQM_READS/21644/21644_sqm_reads.out.allreads.funcog |
Annotation - sampleID_sqm_reads.out.allreads.funkegg | Abundance of all KEGG functions | Squeezemeta reads - input: R1p,merged,R1R2u | /datasets/work/oa-amd/work/amd-work/SQM_READS/21644/21644_sqm_reads.out.allreads.funkegg |
Annotation - sampleID_sqm_reads.out.allreads.mcount | Abundance of all taxa | Squeezemeta reads - input: R1p,merged,R1R2u | /datasets/work/oa-amd/work/amd-work/SQM_READS/21644/21644_sqm_reads.out.allreads.mcount |
Annotation - sampleID_sqm_reads.out.allreads.mappingstat | Summary of total reads and hits to nr | Squeezemeta reads - input: R1p,merged,R1R2u | /datasets/work/oa-amd/work/amd-work/SQM_READS/21644/21644_sqm_reads.out.mappingstat |
checksum for SQM reads - sampleID.md5 | md5sum of SQM_reads files | ||
Annotation - 02.sampleID.16S.txt | Assignment (RDP classifier) for the 16S rRNAs sequences found | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/02.Hadza.16S.txt |
Annotation - 02.sampleID.rnas | Fasta file containing all RNAs found | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/02.Hadza.rnas |
Annotation - 02.sampleID.trnas | Text file containing contig and position of tRNAs found | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/02.Hadza.trnas |
Annotation - 02.sampleID.trnas.fasta | Fasta file containing the contigs resulting from the assembly, masking the positions where a tRNA was found | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/02.Hadza.trnas.fasta |
Annotation - 02.sampleID.maskedrna.fasta | Fasta file containing the contigs resulting from the assembly, masking the positions where a RNA was found | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/intermediate/02.Hadza.maskedrna.fasta |
Annotation - 03.sampleID.faa | Amino acid sequences for predicted ORFs | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/03.Hadza.faa |
Annotation - 03.sampleID.fna | Nucleotide sequences for predicted ORFs | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/03.Hadza.fna |
Annotation - 03.sampleID.gff | Features and position in contigs for each of the predicted genes | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/03.Hadza.gff |
Annotation - 06.sampleID.fun3.tax.noidfilter.wranks | taxonomic assignments not considering identity filters for each ORF, including taxonomic ranks | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/06.Hadza.fun3.tax.noidfilter.wranks |
Annotation - 06.sampleID.fun3.tax.wranks | taxonomic assignments for each ORF, including taxonomic ranks | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/06.Hadza.fun3.tax.wranks |
Annotation - 07.sampleID.fun3.cog | COG functional assignment for each ORF | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/07.Hadza.fun3.cog |
Annotation - 07.sampleID.fun3.kegg | KEGG functional assignment for each ORF | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/07.Hadza.fun3.kegg |
Annotation - 07.sampleID.fun3.pfam | PFAM functional assignment for each ORF | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/07.Hadza.fun3.pfam |
Annotation statistics - 10.sampleID.mappingstat | Mapping percentage of reads to samples | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/10.Hadza.mappingstat |
Annotation statistics -10.sampleID.mapcount | Several measures regarding mapping of reads to ORFs | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/intermediate/10.Hadza.mapcount |
Annotation statistics 10.sampleID.contigcov | Several measures regarding mapping of reads to ORFs | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/intermediate/10.Hadza.contigcov |
Annotation - 11.sampleID.mcount | Abundance table of taxa | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/11.Hadza.mcount |
Annotation - 12.sampleID.cog.funcover | measurements of the abundance and distribution of each COG | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/12.Hadza.cog.funcover |
Annotation - 12.sampleID.kegg.funcover | measurements of the abundance and distribution of each KEGG | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/12.Hadza.kegg.funcover |
Annotation - 13.sampleID.orftable | Several measures regarding ORF characteristics | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/13.Hadza.orftable |
Annotation - 20.sampleID.contigtable | Compilation of data for contigs | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/20.Hadza.contigtable |
Annotation - 21.sampleID.kegg.pathways | prediction of KEGG pathways in bins | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/21.Hadza.kegg.pathways |
Annotation - 21.sampleID.metacyc.pathways | prediction of Metacyc pathways in bins | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/21.Hadza.metacyc.pathways |
Annotation statistics - 22.sampleID.stats | Several statistics regarding ORFs, contigs and bins | Squeezemets full workflow - input R1, R2 Barrnap | /datasets/work/oa-env-gen/work/Smith/Hadza/results/22.Hadza.stats |
checksum for SQM full workflow - sampleID.md5 | md5sum of SQM full workflow | No file generated yet |
What about the download buttons? Do …
… make any sense in metagenome mode? Maybe replace "Download OTU and Contextual Data" with "Download metagenome files" which pops up a modal dialog containing checkboxes for the various metagenome files, then download a zip file of selected files for selected sites?
What about map display?
After talking to @abissett 21 March 2022:
metaxa_from_metagenomes
, probably because there is no 20k abundance data. Fix this to just show sites.For metagenome data, replace
with "Download metagenome data (CSV)". Pop up a modal dialog to select required files for selected sites using checkboxes and provide a "download" button.
Under construction in https://github.com/BioplatformsAustralia/bpaotu/tree/metagenome-feature-WIP All frontend stuff so far, with stubs in a few places.
Discussed with @mtearle on 12 April 2022.
ckanapi
. See https://usersupport.bioplatforms.com/programmatic_access.htmlDepends on https://github.com/BioplatformsAustralia/bpaotu/issues/198 to allow filtering by map location.
Done: https://github.com/BioplatformsAustralia/bpaotu/commit/b3e6cf5560477c486e75cc89615f250a6e507064
example of secondary data from BPA dataportal (threatened species initiative)
https://data.bioplatforms.com/dataset/bpa-tsi-genome-assembly-359774
Initial metagenome data for one sample for testing: https://data.bioplatforms.com/dataset/bpa-amdb-metagenomics-analysed-21645
Per-sample metagenome downloads now working in https://github.com/BioplatformsAustralia/bpaotu/commit/3815a36ab660751775bd0f8d682d4b9cb016c94b
Bulk downloads (i.e. multiple samples, multiple metagenome files) is still a work-in-progress. See https://github.com/BioplatformsAustralia/bpaotu/blob/3815a36ab660751775bd0f8d682d4b9cb016c94b/frontend/src/pages/search_page/components/metagenome_modal.tsx#L65
Bulk downloads implemented in a1c82f41758127ae5526e741bef870ef1825712e
Ready to test as of e564bc4e99756517d05e80cd6839f3c2d7db8b6f ( tag: 1.35.1-metagenomedemo4 )
Implemented in https://github.com/BioplatformsAustralia/bpaotu/tree/1.36.0
Docs related to new metagenome features