Open nekrut opened 2 weeks ago
Ok thx @nekrut we will start on this and collect the tables from NCBI...
Also link to UCSC genome browser in the genome file.
Sry, not sure this is the right place for this comment.. but were it me I'd seriously consider adding some kinetoplastids to that list of initial taxa.
T. Cruzi T. Brucei Leish major Leish donovoni Leish brazilensis
Those are the ones coming to me off the top of my head, though I feel like that's maybe missing a big leish species or two. I might not have the spelling quite right either.. it'd give you Chagas, African sleeping sickness and iirc all three forms of leish though I need to double check that. Considering the popularity of tritrypdb and the impact of these diseases, these species would be a very notable omission.
Also, pretty sure we now have a few locally acquired cases of mucosal leish in Texas, as the sandfly habitat expands, so there's 'local' relevance.. thanks global warming
Here is the initial set pf species https://docs.google.com/spreadsheets/d/1Gg9sw2Qw765tOx2To53XkTAn-RAMiBtqYrfItlLXXrc/edit?usp=sharing
(replaces #153 )
@nekrut Question -- how can we map the genomes returned by NCBI to the UCSC Browser URLs specified in assemblyList.json? Previously we matched Genome Version/Assembly ID
from this genomes spreadsheet with either genBank
or refSeq
from the assembly list, but I'm not familiar enough with what the fields mean to determine which ID(s) from the NCBI API would be necessary to match with the ones in the assembly list.
Thanks!
@nekrut Question -- how can we map the genomes returned by NCBI to the UCSC Browser URLs specified in assemblyList.json? Previously we matched
Genome Version/Assembly ID
from this genomes spreadsheet with eithergenBank
orrefSeq
from the assembly list, but I'm not familiar enough with what the fields mean to determine which ID(s) from the NCBI API would be necessary to match with the ones in the assembly list.Thanks!
Good point. They need to be built first. I will initiate process over the weekend. This can happen very quickly, but for now let's not link them to UCSC yet.
This issue illustrates how NCBI Datasets API can be used to generates JSON blobs necessary for rendering filament pages (https://github.com/galaxyproject/brc-analytics/issues/130).
Linked Tickets
Non viral data
For initial set of taxa will be limited to these species: https://docs.google.com/spreadsheets/d/1Gg9sw2Qw765tOx2To53XkTAn-RAMiBtqYrfItlLXXrc/edit?usp=sharing
List view
The following API call is used:
THis generates the following response:
From this response we would like to render the following fields on a page (only showing two rows)
These are populated from:
reports
->taxonomy
->current_scientific_name
->name
)reports
->taxonomy
->taxid
)reports
->taxonomy
->counts[0]
)Genomes page
Now let's suppose on the previous page a clicked both Anopheles gambiae and Coccidioides immitis checkboxes and selected "Go to Genomes" button.
This will be equivalent to passing the following GET request:
https://api.ncbi.nlm.nih.gov/datasets/v2/genome/taxon/7165%2C5501/dataset_report?filters.assembly_source=refseq&filters.has_annotation=true&filters.exclude_paired_reports=true&filters.exclude_atypical=true&filters.assembly_level=scaffold&filters.assembly_level=chromosome&filters.assembly_level=complete_genome
Which will be rendered as the following genome page:
organism -> organism_name
organism -> tax_id
accession
assembly_info -> refseq_category
assembly_info -> assembly_level
ssembly_stats -> total_number_of_chromosomes
assembly_stats -> total_sequence_length
assembly_stats -> number_of_scaffolds
assembly_stats -> scaffold_n50
assembly_stats -> scaffold_l50
assembly_stats -> gc_percent
annotation_info -> status