jennaj commented 8 years ago

usegalaxy.org

CONVERT THIS ISSUE TO A PROJECT @jennaj

This list changes over time as new data sources are targetted for indexing and user requests are considered. See posts below for genome batches completed and in progress.

Current plans are to bring http://usegalaxy.org up to date with UCSC's released genomes, indexed for all tools, so those do not need to be requested by users at this time.

Main:

[ ] Automate index creation on Main https://github.com/galaxyproject/usegalaxy-playbook/issues/38

Other reference data:

[ ] https://github.com/galaxyproject/galaxy/issues/1470#issuecomment-220096290 - Add hg38 MAF
[ ] https://github.com/galaxyproject/usegalaxy-playbook/issues/80 - Update Kraken databases
[ ] https://github.com/galaxyproject/tools-devteam/issues/390 -Update SICER to include more genome builds (ideally all on the instance). Not an indexing/DM issue exactly, but should it be?

Admin/Local Data and DM usage enhancements are included here.

[ ] https://github.com/galaxyproject/galaxy/issues/1471 -Methods to manage genomes, Admin enhancement ideas
[ ] https://github.com/galaxyproject/galaxy/issues/1516 -Sort view by dbkey Admin -> Local Data -> View Tool Data Table Entries
[ ] https://github.com/galaxyproject/galaxy/issues/1518 -Add dbkey to report and sort by it in Admin -> Local Data -> View Data Manager Jobs
[ ] https://github.com/galaxyproject/tools-devteam/issues/319 -Modify Bowtie2 DM so that it will not create duplicates when "Include Tophat2" is used
[ ] https://github.com/galaxyproject/tools-iuc/issues/530 -HISAT2 Data manager reports unrecognized arguments for SNP annotation
[ ] https://github.com/galaxyproject/galaxy/issues/1904 -LiftOver DM request

Resolved data issues:

[x] Some indexes are associated with new tools part of https://github.com/galaxyproject/tools-iuc/issues/488
- [x] https://github.com/galaxyproject/galaxy/issues/1679 -Track promotion of Kraken reference data to http://usegalaxy.org
- [x] https://github.com/galaxyproject/galaxy/issues/3096 -Add more genomes to SnpEff
- [x] https://github.com/galaxyproject/galaxy/issues/2128 -Restore Fetch Taxonomic Rep tool to http://usegalaxy.org. Closed - Use Kraken instead.
- [x] https://github.com/galaxyproject/galaxy/issues/1885 -Fetch Fasta DM config issue
- [x] https://github.com/galaxyproject/galaxy/issues/2530 -Add in latest dbkey to builds.txt
- [x] https://github.com/galaxyproject/galaxy/issues/2922 -Improve UI admin access to locally cached data index files

New genomes and indexes will be installed at https://test.galaxyproject.org/ first for testing. If your genome is listed and checked as complete, community testing and feedback can be posted to https://biostar.usegalaxy.org or through a bug report from the error dataset (from a mapping tool, etc).

All data will later be promoted to http://usegalaxy.org. Timeline is not firm.

[x] Indexes Galaxy Main April 2016. https://github.com/galaxyproject/galaxy/issues/1470#issuecomment-208442266
[ ] Indexes Galaxy Main 2018. https://github.com/galaxyproject/galaxy/issues/1470#issuecomment-208444904

Making a Reference genome request

Create reply below
Be specific
- Name of organism, including common "key" used
- Exact source. Ex: UCSC (dbkey), NCBI project ID, other URL
- Build details: include mito, chloro, plasmids, etc.
- All indexes are now generated for brand new genomes by default. Or pick one of: "All" "Bowtie2" "MyFavTool" if your genome is at http://usegalaxy.org, but not available in your tool of interest
- Don't forget anyone can add and use a custom genome right now with most tools - no waiting! https://wiki.galaxyproject.org/Learn/CustomGenomes

jennaj commented 8 years ago

For reference,

Master spreadsheet of dbkeys and indexes done and to-do. Older genomes removed. https://docs.google.com/spreadsheets/d/1jtDC-2STroUINP6KVrfhZwGQgpP5y-HhkRMONZtD1W4/edit?usp=sharing

dbkeys with fasta loaded. https://gist.github.com/jennaj/aeb8d6af4e4722a89f62d15af8ce3452

jennaj commented 8 years ago

Issues detected

Meets goal of consistency in nomeclature permiting DMs to function

change genome label in all_fasta (kill "full" in all descriptions/dbkeys). Other locs may need mods.

[ ] ci2full to ci2 (dbkey & description)
[ ] cb3full to cb3 (dbkey & description)
[ ] "panTro3 Full" to "panTro3" (description only, no "Canonical" exists)

jennaj commented 8 years ago

Genomes that need followup:

Not at http://genome.ucsc.edu (browser or download). Are at http://genome-test.cse.ucsc.edu/. Pending release?

[ ] rheMac7 Rhesus (in builds list, but that does not contain genome-test anymore. odd)
[ ] rheMac8 Rhesus (not in builds list, can be captured next update)

jennaj commented 8 years ago

Completed and indexes promoted to http://usegalaxy.org (Galaxy Main) April 2016

Fasta

New genomes (confirmed, to be indexed for all)

[x] rn6 Rat
[x] dm6 Fruit Fly
[x] musFur1 Ferret
[x] cerSim1 White Rhino
[x] nomLeu3 Gibbon
[x] danRer10 Zebrafish **not in builds list - created dbkey but did not populate to http://usegalaxy.org. Impact, all data (inc indexes) does not populate on mapping tools forms, etc) See https://github.com/galaxyproject/galaxy/issues/2530
[x] bosTau8 Cow
[x] papAnu2 Baboon
[x] vicPac1 Alpaca
[x] vicPac2 Alpaca
[x] allMis1 American alligator
[x] dasNov3 Armadillo
[x] gadMor1 Atlantic cod
[x] panPan1 Bonobo
[x] aptMan1 Brown Kiwi not in builds list - created dbkey Same issue as danRer10.
[x] felCat8 Cat

2bit

Note: Lastz indexes created by same DM

New genomes

[x] rn6 Rat
[x] dm6 Fruit Fly
[x] musFur1 Ferret
[x] cerSim1 White Rhino
[x] nomLeu3 Gibbon
[x] danRer10 Zebrafish
[x] bosTau8 Cow
[x] papAnu2 Baboon
[x] vicPac1 Alpaca
[x] vicPac2 Alpaca
[x] allMis1 American alligator
[x] dasNov3 Armadillo
[x] gadMor1 Atlantic cod
[x] panPan1 Bonobo
[x] aptMan1 Brown Kiwi
[x] felCat8 Cat

Existing

[x] melUnd1 Budgerigar
[x] bosTau7 Cow

Sam

New genomes

[x] rn6 Rat
[x] dm6 Fruit Fly
[x] musFur1 Ferret
[x] cerSim1 White Rhino
[x] nomLeu3 Gibbon
[x] danRer10 Zebrafish
[x] bosTau8 Cow
[x] papAnu2 Baboon
[x] vicPac1 Alpaca
[x] vicPac2 Alpaca
[x] allMis1 American alligator
[x] dasNov3 Armadillo
[x] gadMor1 Atlantic cod
[x] panPan1 Bonobo
[x] aptMan1 Brown Kiwi
[x] felCat8 Cat

Existing

Picard

New genomes

[x] rn6 Rat
[x] dm6 Fruit Fly
[x] musFur1 Ferret
[x] cerSim1 White Rhino
[x] nomLeu3 Gibbon
[x] danRer10 Zebrafish
[x] bosTau8 Cow
[x] papAnu2 Baboon
[x] vicPac1 Alpaca
[x] vicPac2 Alpaca
[x] allMis1 American alligator
[x] dasNov3 Armadillo
[x] gadMor1 Atlantic cod
[x] panPan1 Bonobo
[x] aptMan1 Brown Kiwi
[x] felCat8 Cat

Existing

Bowtie2/Tophat2

Issue about Bowtie2 DM creating duplicate indexes: https://github.com/galaxyproject/tools-devteam/issues/319

New genomes

[x] rn6 Rat
[x] dm6 Fruit Fly
[x] musFur1 Ferret
[x] cerSim1 White Rhino
[x] nomLeu3 Gibbon
[x] danRer10 Zebrafish
[x] bosTau8 Cow
[x] papAnu2 Baboon
[x] vicPac1 Alpaca
[x] vicPac2 Alpaca
[x] allMis1 American alligator
[x] dasNov3 Armadillo
[x] gadMor1 Atlantic cod
[x] panPan1 Bonobo
[x] aptMan1 Brown Kiwi
[x] felCat8 Cat

Existing

[x] galGal3 Chicken Full & Canonical
[x] galGal4 Chicken
[x] melGal1 Turkey
[x] equCab1 Equus caballus
[x] equCab2 Equus caballus
[x] loxAfr1 African Elephant (duplicate in Bowtie2 - DM does not allow Tophat2 only creation - ticket)
[x] loxAfr3 African Elephant
[x] sacCer2 S. cerevisiae
[x] sacCer3 S. cerevisiae
[x] Schizosaccharomyces_pombe_1.1
[x] rheMac2 Rhesus
[x] rheMac3 Rhesus
[x] eschColi_K12 Escherichia coli (str. K-12 substr. MG1655)
[x] melUnd1 Budgerigar

BWA/BWA-MEM

New genomes

[x] rn6 Rat
[x] dm6 Fruit Fly
[x] musFur1 Ferret
[x] cerSim1 White Rhino
[x] nomLeu3 Gibbon
[ ] danRer10 Zebrafish
[x] bosTau8 Cow
[x] papAnu2 Baboon
[x] vicPac1 Alpaca
[x] vicPac2 Alpaca
[x] allMis1 American alligator
[x] dasNov3 Armadillo
[x] gadMor1 Atlantic cod
[x] panPan1 Bonobo
[x] aptMan1 Brown Kiwi (built using BWT-SW)
[x] felCat8 Cat

Existing

[x] equCab1 Equus caballus
[x] equCab2 Equus caballus
[x] sacCer2 S. cerevisiae
[x] sacCer3 S. cerevisiae (created dup, needs cleanup)
[x] Schizosaccharomyces_pombe_1.1
[x] rheMac2 Rhesus
[x] rheMac3 Rhesus
[x] galGal3 Chicken (full) (built using BWT-SW)
[x] galGal3 Canonical
[x] galGal4 Chicken
[x] melGal1 Turkey
[x] loxAfr1 African Elephant
[x] loxAfr3 African Elephant
[x] bosTauMd3 Cow
[x] ce9 C. elegans
[x] susScr2 Pig
[x] canFam2 Dog
[x] canFam3 Dog
[x] eschColi_K12 Escherichia coli (str. K-12 substr. MG1655)
[x] papHam1 Baboon
[x] melUnd1 Budgerigar (built using BWT-SW)
[x] otoGar1 Bushbaby
[x] otoGar3 Bushbaby
[x] felCat5 Cat
[x] panTro3 Chimpanzee Full & Canonical
[x] panTro4 Chimpanzee
[x] turTru2 Dolphin

HISAT2

New genomes

[x] rn6 Rat
[x] dm6 Fruit Fly
[x] musFur1 Ferret
[x] cerSim1 White Rhino
[x] nomLeu3 Gibbon
[ ] danRer10 Zebrafish
[x] bosTau8 Cow
[x] papAnu2 Baboon
[x] vicPac1 Alpaca
[x] vicPac2 Alpaca
[x] allMis1 American alligator
[x] dasNov3 Armadillo
[x] gadMor1 Atlantic cod
[x] panPan1 Bonobo
[x] aptMan1 Brown Kiwi
[x] felCat8 Cat

Existing

[x] hg38
[x] hg38canon
[x] hg38female
[x] hg19
[x] hg19canon
[x] hg19female
[x] hg19_rCRS_pUC18_phiX174
[x] hg_g1k_v37 1000Genomes
[x] mm10
[x] mm9
[x] dm3
[x] equCab1 Equus caballus
[x] equCab2 Equus caballus
[x] sacCer2 S. cerevisiae
[x] sacCer3 S. cerevisiae
[x] Schizosaccharomyces_pombe_1.1
[x] rheMac2 Rhesus
[x] rheMac3 Rhesus
[x] galGal3 Chicken Full & Canonical
[x] galGal4 Chicken
[x] melGal1 Turkey
[x] loxAfr1 African Elephant
[x] loxAfr3 African Elephant
[x] ce9 C. elegans
[x] ce10 C. elegans
[x] susScr2 Pig
[x] susScr3 Pig
[x] bosTauMd3 Cow
[x] bosTau7 Cow
[x] canFam2 Dog
[x] canFam3 Dog
[x] eschColi_K12 Escherichia coli (str. K-12 substr. MG1655)
[x] papHam1 Baboon
[x] melUnd1 Budgerigar
[x] otoGar1 Bushbaby
[x] otoGar3 Bushbaby
[x] felCat5 Cat
[x] panTro3 Chimpanzee Full & Canonical
[x] panTro4 Chimpanzee
[x] turTru2 Dolphin

Liftover

See distinct tracking checklist, below

jennaj commented 8 years ago

2018

Fasta

New genomes (confirmed, to be indexed for all)

[ ] Arabidopsis_thaliana_TAIR10 (exists in all_fasta.loc, check others)
[ ] hg38Patch2 - (GRCh38.p2 Human) source NCBI or UCSC? Check chrom IDs in .len
[ ] anaPla1 Mallard duck Apr 2013 (BGI_duck_1.0/anaPla1) - source UCSC test server
[ ] ce11 C. elegans Feb. 2013 (WBcel235/ce11)
[x] criGri1 Chinese hamster
[ ] fr3 Fugu Oct. 2011 (FUGU5/fr3) (fr3)
[ ] galGal5 Dec 2015 (Gallus_gallus-5.0/galGal5)
[x] latCha1 Coelacanth
[ ] oviAri3 Aug. 2012 (ISGC Oar_v3.1/oviAri3)
[ ] rheMac8 Nov. 2015 (BCM Mmul_8.0.1/rheMac8)
[ ] susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11)

2bit

Note: Lastz indexes created by same DM

New genomes

[ ] Arabidopsis_thaliana_TAIR10 (exists in all_fasta.loc, check others)
[ ] hg38Patch2 - (GRCh38.p2 Human) source NCBI or UCSC? Check chrom IDs in .len
[ ] anaPla1 Mallard duck Apr 2013 (BGI_duck_1.0/anaPla1) - source UCSC test server
[ ] ce11 C. elegans Feb. 2013 (WBcel235/ce11)
[ ] criGri1 Chinese hamster
[ ] fr3 Fugu Oct. 2011 (FUGU5/fr3) (fr3)
[ ] galGal5 Dec 2015 (Gallus_gallus-5.0/galGal5)
[ ] latCha1 Coelacanth
[ ] oviAri3 Aug. 2012 (ISGC Oar_v3.1/oviAri3)
[ ] rheMac8 Nov. 2015 (BCM Mmul_8.0.1/rheMac8)
[ ] susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11)

Existing

[ ] Arabidopsis_thaliana_TAIR10 (check if exists)
[ ] Add here

Sam

New genomes

[ ] Arabidopsis_thaliana_TAIR10 (exists in all_fasta.loc, check others)
[ ] hg38Patch2 - (GRCh38.p2 Human) source NCBI or UCSC? Check chrom IDs in .len
[ ] anaPla1 Mallard duck Apr 2013 (BGI_duck_1.0/anaPla1) - source UCSC test server
[ ] ce11 C. elegans Feb. 2013 (WBcel235/ce11)
[ ] criGri1 Chinese hamster
[ ] fr3 Fugu Oct. 2011 (FUGU5/fr3) (fr3)
[ ] galGal5 Dec 2015 (Gallus_gallus-5.0/galGal5)
[ ] latCha1 Coelacanth
[ ] oviAri3 Aug. 2012 (ISGC Oar_v3.1/oviAri3)
[ ] rheMac8 Nov. 2015 (BCM Mmul_8.0.1/rheMac8)
[ ] susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11)

Existing

[ ] Add here

Picard

New genomes

[ ] Arabidopsis_thaliana_TAIR10 (exists in all_fasta.loc, check others)
[ ] hg38Patch2 - (GRCh38.p2 Human) source NCBI or UCSC? Check chrom IDs in .len
[ ] anaPla1 Mallard duck Apr 2013 (BGI_duck_1.0/anaPla1) - source UCSC test server
[ ] ce11 C. elegans Feb. 2013 (WBcel235/ce11)
[ ] criGri1 Chinese hamster
[ ] fr3 Fugu Oct. 2011 (FUGU5/fr3) (fr3)
[ ] galGal5 Dec 2015 (Gallus_gallus-5.0/galGal5)
[ ] latCha1 Coelacanth
[ ] oviAri3 Aug. 2012 (ISGC Oar_v3.1/oviAri3)
[ ] rheMac8 Nov. 2015 (BCM Mmul_8.0.1/rheMac8)
[ ] susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11)

Existing

[ ] Add here

Bowtie2/Tophat2

New genomes

[ ] Arabidopsis_thaliana_TAIR10 (exists in all_fasta.loc, check others)
[ ] hg38Patch2 - (GRCh38.p2 Human) source NCBI or UCSC? Check chrom IDs in .len
[ ] anaPla1 Mallard duck Apr 2013 (BGI_duck_1.0/anaPla1) - source UCSC test server
[ ] ce11 C. elegans Feb. 2013 (WBcel235/ce11)
[ ] criGri1 Chinese hamster
[ ] fr3 Fugu Oct. 2011 (FUGU5/fr3) (fr3)
[ ] galGal5 Dec 2015 (Gallus_gallus-5.0/galGal5)
[ ] latCha1 Coelacanth
[ ] oviAri3 Aug. 2012 (ISGC Oar_v3.1/oviAri3)
[ ] rheMac8 Nov. 2015 (BCM Mmul_8.0.1/rheMac8)
[ ] susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11)

Existing

[ ] danRer9 Zebrafish (double check if needed)

BWA/BWA-MEM

New genomes

[ ] Arabidopsis_thaliana_TAIR10 (exists in all_fasta.loc, check others)
[ ] hg38Patch2 - (GRCh38.p2 Human) source NCBI or UCSC? Check chrom IDs in .len
[ ] anaPla1 Mallard duck Apr 2013 (BGI_duck_1.0/anaPla1) - source UCSC test server
[ ] ce11 C. elegans Feb. 2013 (WBcel235/ce11)
[ ] criGri1 Chinese hamster
[ ] fr3 Fugu Oct. 2011 (FUGU5/fr3) (fr3)
[ ] galGal5 Dec 2015 (Gallus_gallus-5.0/galGal5)
[ ] latCha1 Coelacanth
[ ] oviAri3 Aug. 2012 (ISGC Oar_v3.1/oviAri3)
[ ] rheMac8 Nov. 2015 (BCM Mmul_8.0.1/rheMac8)
[ ] susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11)

Existing

[ ] danRer9 Zebrafish
[ ] danRer10 Zebrafish

HISAT2

New genomes

[ ] Arabidopsis_thaliana_TAIR10 (exists in all_fasta.loc, check others)
[ ] hg38Patch2 - (GRCh38.p2 Human) source NCBI or UCSC? Check chrom IDs in .len
[ ] anaPla1 Mallard duck Apr 2013 (BGI_duck_1.0/anaPla1) - source UCSC test server
[ ] ce11 C. elegans Feb. 2013 (WBcel235/ce11)
[ ] criGri1 Chinese hamster
[ ] fr3 Fugu Oct. 2011 (FUGU5/fr3) (fr3)
[ ] galGal5 Dec 2015 (Gallus_gallus-5.0/galGal5)
[ ] latCha1 Coelacanth
[ ] oviAri3 Aug. 2012 (ISGC Oar_v3.1/oviAri3)
[ ] rheMac8 Nov. 2015 (BCM Mmul_8.0.1/rheMac8)
[ ] susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11)

Existing

[ ] danRer9 Zebrafish
[ ] danRer10 Zebrafish

Liftover

See distinct tracking checklist, below

RNA STAR

Fast tracked genomes https://github.com/galaxyproject/galaxy/issues/1470#issuecomment-307517254

New genomes

[ ] Arabidopsis_thaliana_TAIR10 (exists in all_fasta.loc, check others)
[ ] hg38Patch2 - (GRCh38.p2 Human) source NCBI or UCSC? Check chrom IDs in .len
[ ] anaPla1 Mallard duck Apr 2013 (BGI_duck_1.0/anaPla1) - source UCSC test server
[ ] ce11 C. elegans Feb. 2013 (WBcel235/ce11)
[ ] criGri1 Chinese hamster
[ ] fr3 Fugu Oct. 2011 (FUGU5/fr3) (fr3)
[ ] galGal5 Dec 2015 (Gallus_gallus-5.0/galGal5)
[ ] latCha1 Coelacanth
[ ] oviAri3 Aug. 2012 (ISGC Oar_v3.1/oviAri3)
[ ] rheMac8 Nov. 2015 (BCM Mmul_8.0.1/rheMac8)
[ ] susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11)

Existing

[ ] danRer9 Zebrafish
[ ] danRer10 Zebrafish

jennaj commented 8 years ago

New genomes under review (source/licence)

[ ] Dolphin NIST Tur_tru v1 https://www.ncbi.nlm.nih.gov/assembly/GCF_001922835.1/ (check that UCSC does not plan to publish, re: http://hgdownload.soe.ucsc.edu/downloads.html#dolphin)
[ ] Catfish IpCoco_1.2 https://www.ncbi.nlm.nih.gov/assembly/GCF_001660625.1/
[ ] Ovis aries Oar_v4.0 its from late 2015 (NCBI)
[ ] X. laevis - ftp://ftp.xenbase.org/pub/Genomics/JGI/Xenla9.1/
[ ] Triticum_aestivum.IWGSP1.23 (Wheat) - one source, NCBI: http://ensembl.gramene.org/Triticum_aestivum/Info/Index
[ ] Macaca fascicularis - confirm source: details https://trello.com/c/mJWnAuuQ
[ ] American Mink (Neovison vison) - choose source, not at UCSC
[ ] Streptococcus pyogenes MGAS5005 NC_007297.1 (under review)
[ ] Streptococcus pyogenes MGAS315 NC_004070.1 (under review)
[ ] Streptococcus pyogenes MGAS9429 NC_008021.1 (under review)
[ ] Streptococcus pyogenes MGAS6180 NC_007296.1 (under review)
[ ] macFas5 Macaca fascicularis Crab-eating macaque (not full browser/in UCSC downloads - Test only)
[ ] macNem-X Macaca nemestrina Pig-tailed macaque (not full browser/in UCSC downloads - not Test, but used in phylo)
[ ] Pinus taeda ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000404065.2_Ptaeda1.01
[ ] Picea glauca ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000411955.5_PG29_v4.1

jennaj commented 8 years ago

Liftover

Needs DM: https://github.com/galaxyproject/galaxy/issues/1904

Workaround: Use the LiftOver tool at UCSC (the source for the wrapped version in Galaxy) and upload the results to Galaxy to use with other analysis. http://genome.ucsc.edu/cgi-bin/hgLiftOver

New genomes

[ ] rn6 Rat
[ ] dm6 Fruit Fly
[ ] musFur1 Ferret
[ ] cerSim1 White Rhino
[ ] nomLeu3 Gibbon
[ ] danRer10 Zebrafish
[ ] danRer9 Zebrafish (update)
[ ] bosTau8 Cow
[ ] papAnu2 Baboon
[ ] vicPac1 Alpaca
[ ] vicPac2 Alpaca
[ ] allMis1 American alligator
[ ] dasNov3 Armadillo
[ ] gadMor1 Atlantic cod
[ ] panPan1 Bonobo
[ ] aptMan1 Brown Kiwi
[ ] melUnd1 Budgerigar (only had .fa, sam, picard originally - odd)
[ ] felCat8 Cat
[ ] criGri1 Chinese hamster
[ ] latCha1 Coelacanth

Existing (update)

[ ] Probably all - need automated retrieval of new, ignore existing

natefoo commented 8 years ago

@jennaj I updated the April 2016 comment to include the missing BWA indexes that I was able to build with the BWT-SW algorithm.

Some (like galGal3 and panTro3) with full/canonical variants I rebuilt. The only difference from the original DM run is that after selecting the correct build from "Source FASTA Sequence", I put the build variant name (e.g. galGal3canon) in the "ID for sequence" field. Otherwise the builds clobber eachother in the index dir on disk (the "ID for sequence" field is used for naming the index subdirectory and defaults to the dbkey - which for both full and canonical builds is still just e.g. galGal3 - this could be a bit more intuitive in the DM, I had no idea what "ID for sequence" was for until I noticed that two loc file entries pointed to the same directory/indexes on disk and then dug into the DM code to understand it). I rebuilt these for any indexes which had the variants built originally, and cleaned up the old directories and their entries in the location files.

These BWA indexes and the rest of the indexes in that comment are now in the process of being published to CVMFS and once done (this may take a long time) will be available on usegalaxy.org (after a restart, I'll comment again when it's all ready).

natefoo commented 8 years ago

@jennaj The publishing is finished and Main has been restarted.

jennaj commented 8 years ago

Add hg38 MAF alignments. Request: https://biostar.usegalaxy.org/p/17690

massaali commented 7 years ago

Hello,

I saw 7 weeks ago that another user had made this same request for a newer version of the sheep reference genome - you currently have OviAri1 which is 6 years old and there are two newer versions (about to be 3 newer versions) could we get a newer version? Sheep are amazing agricultural species important for meat milk and wool production and more researchers should study them! I request the current version on NCBI/ENSEMBL for all tools Bowtie and mapping tools, and chIP-seq, RNA-seq tools too: Ovis aries Oar_v4.0 its from late 2015.

Thank you for considering!

sayalih commented 7 years ago

Hi

I think the best way to is to use your genome of interest - use the fasta format and upload it on galaxy using firezilla. And there is an option to align with your uploaded sequence instead of the reference genome. Links to how to do this: https://wiki.galaxyproject.org/Support#Custom_reference_genome

I don't think they are uploading any more reference genomes on their default list.

Sayali.

Update by @jennaj: Yes, use a custom reference genome for now. I will add in sheep and other requests to the next list of updates https://github.com/galaxyproject/galaxy/issues/1470#issuecomment-208444904

vebaev commented 7 years ago

It will be great if you include the tomate 2.40 genome from: ftp://ftp.solgenomics.net/tomato_genome/ And pepper C.annuum_cvCM334 from: ftp://ftp.solgenomics.net/genomes/Capsicum_annuum/C.annuum_cvCM334/

@jennaj Yes, they are in NCBI (tomato and pepper): https://www.ncbi.nlm.nih.gov/genome/7 https://www.ncbi.nlm.nih.gov/genome/10896

jennaj commented 7 years ago

Priority indexes

RNA STAR

[x] Deployed and functions on Test https://test.galaxyproject.org
[x] Deployed and functions on Main https://usegalaxy.org

Indexes

[x] hg38
[x] hg19
[x] mm10
[x] mm9
[x] rn6
[x] rn5
[x] dm6
[x] dm3
[x] sacCer3
[x] sacCer2
[x] ce9
[x] ce10

Future requests (may be moved to a new post in this same issue)

[ ] bosTau8

iraplee commented 7 years ago

We're looking for X. tropicalus index to be uploaded to HiSat2

xenTro1 xenTro1 Frog (Xenopus tropicalis): xenTro1 /galaxy/data/xenTro1/seq/xenTro1.fa xenTro2 xenTro2 Frog (Xenopus tropicalis): xenTro2 /galaxy/data/xenTro2/seq/xenTro2.fa xenTro3 xenTro3 Frog (Xenopus tropicalis): xenTro3 /galaxy/data/xenTro3/seq/xenTro3.fa

bimbam23 commented 7 years ago

New Pig genome: Sus Scrofa 11.1, susscr4 NCBI GCF_000003025.6 all: (chr and chrUn plus chrMT)

genome: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/003/025/GCF_000003025.6_Sscrofa11.1/GCF_000003025.6_Sscrofa11.1_genomic.fna.gz gff3: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/003/025/GCF_000003025.6_Sscrofa11.1/GCF_000003025.6_Sscrofa11.1_genomic.gff.gz

lookup table nice names: https://test.galaxyproject.org/u/bickj/h/pig-genome-lookup-table

PseudomonasP commented 6 years ago

Dear Galaxy Team, I hope this is still the right place to request genome additions.

If we could get Brassica napus (Bna) as a built-in genome, that would be amazing: http://www.genoscope.cns.fr/brassicanapus/data/

Please note that although the annotation is titled v5 while the genome itself is v4.1, it should work just fine, as we have had no problems with it.

jennaj commented 6 years ago

Add NCBI's Xenopus laevis and Xenopus tropicalis genomes (indexed for all tools).

The genome is at https://usegalaxy.eu -- so when we get the data synced between all mirrors that might be the best solution.

Request: https://biostar.usegalaxy.org/p/27778

jennaj commented 5 years ago

Request: add Medicago truncatula https://biostar.usegalaxy.org/p/5916/#30132

To-do: Check if present in ELIXER plant genomes already indexed (to be added in cvmfs): https://www.elixir-europe.org/about/groups/galaxy-wg

jennaj commented 5 years ago

Request: add https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Canis_lupus_dingo/100/ re: https://help.galaxyproject.org/t/dingo-reference-genome-upload-request/529

To-do: Check if UCSC has processed the genome

JulienLeclercq commented 5 years ago

Dear Galaxy Team,

Thanks for your amazing work. Please kindly consider adding the following genome to Galaxy Main: Mexican tetra (Astyanax mexicanus) The genome is available at NCBI : https://www.ncbi.nlm.nih.gov/genome/?term=astyanax+mexicanus and the annotation too: https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Astyanax_mexicanus/102/

Please note that the genome is version 2.0 and made from the surface eco-morphotype (unlike the previous version 1.02 from cave eco-morphotype).

In the meantime, I am working with a custom genome.

Best, Julien

hexylena commented 4 years ago

Migrate to usegalaxy-playbook?

jennaj commented 4 years ago

Request:

Genome: Citrus sinensis v1.1

Source: https://www.citrusgenomedb.org/bio_data/79

jennaj commented 4 years ago

Request:

Human herpesvirus 1 with ref accession number NC_001806

jennaj commented 4 years ago

Request:

Genome: Tribolium castaneum genome assembly (Tcas5.2)

Source: https://www.ncbi.nlm.nih.gov/genome?term=tribolium%20castaneum

jennaj commented 4 years ago

Request:

Dada tools: https://github.com/galaxyproject/usegalaxy-playbook/issues/273

psyi commented 4 years ago

Dear Galaxy Team,

It would be great if the genome and annotation release of Physcomitrella patens can be added to Galaxy Main.

They are available at NCBI: https://www.ncbi.nlm.nih.gov/genome/383 https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Physcomitrella_patens/100

Best, Peishan

jennaj commented 4 years ago

@echoyps & everyone else with genome requests:

We will now be adding new genomes over the next few months and throughout the upcoming year. Please continue to post requests here. UCSC and NCBI are the preferred data sources. Others are possible. However requested, be specific.

Reminders:

Anyone can use genome (or transcriptome/exome) fasta data as a custom genome "from the history" now -- you do not need to wait for us to index server-side. Annotation is supplied by the end-user from the history by default (even for built-in indexed genomes) -- with just a few tool exceptions, but those also accept annotation data from the history. Genomes (fasta) are the data that is currently indexed server-side. Annotation may be indexed in the future. Custom genomes (fasta) can be promoted to a custom build (User > Custom Builds) in order to create a custom "database" metadata key that can be assigned to datasets (some tools wrapped for Galaxy require that the "database" is assigned to inputs).

Be sure to format the genome fasta correctly (remove description content on the ">" title line) and make sure the genome build/version and chromosome identifiers are an exact match between the custom reference genome (fasta) and any reference annotation (gtf or gff3) you plan to use in your analysis, before starting any analysis that uses it or promoting the fasta to a custom build. This will avoid problems later on. If there is a formatting problem (example: headers on a gtf dataset) or chromosome mismatch issue between inputs, this usually requires the need to fix the fasta format and start the analysis over from the very start, which can be frustrating. If you have a choice about annotation formats, choose the gtf version instead of the gff3 version -- a gtf formatted annotation dataset is accepted by more tools, and using the same exact annotation data throughout an analysis workflow is very important.

Mapping jobs will usually not "fail" due to chromosome identifier mismatch issues. Instead, if the annotation is input during the mapping step, the annotation will not really be used, creating problematic scientific results that may not be obvious to detect. Tools used downstream with a mismatched genome+annotation can also produce problematic scientific results that are not obvious, or may fail outright with errors that are difficult to interpret. Problematic annotation formatting itself will also lead to problems. Try to avoid issues by preparing your inputs correctly at the start :)

Finally, when loading these data with the Upload tool, allow the datatype to be detected instead of assigning it. This triggers basic format checks and a Galaxy-assigned datatype. If you do not get the expected datatype assigned, this almost always means that there is a formatting issue that needs to be addressed. Most format issues can be resolved within Galaxy. After fixed, the correct datatype can be assigned: Click on the dataset's pencil icon > Edit Attributes forms > Datatypes tab > "detect datatype" (best choice) or directly assigned (be careful if choosing this option). If Galaxy cannot "detect" the format correctly, there is likely still a data content or format problem.

If you ever have a problem that you cannot figure out how to resolve, know that the vast majority of tool errors or unexpected results are due to input issues that can be fixed to achieve a successful and correct scientific/technical result. First, review the tool form help -- most have examples of the expected input's content and format. Next, review our Troubleshooting and other FAQs. If those do not resolve the issue, the Galaxy Help forum is a great place to review prior Q&A or to ask a novel question. The Galaxy Training Network (GTN) tutorials are also a very useful resource -- compare your methods to the examples.

I'm only posting this advice here now since it hasn't been covered for a while at Github, and there are newer related FAQs plus prior Q&A available. Any followup/clarification should be asked about at Galaxy Help (not here).

The FAQs/links below will help with all of the above.

All FAQs: https://galaxyproject.org/support/

Start with these to learn how to use a custom genome and the associated annotation:

Common datatypes explained
Preparing and using a Custom Reference Genome or Build
Mismatched Chromosome identifiers (and how to avoid them)
Extended Help for Differential Expression Analysis Tools -- formatting help included usually applies across tools and analysis goals, not just DE analysis, and whether the analysis is executed in Galaxy or otherwise.

Error or unexpected result FAQ:

My job ended with an error. What can I do?

Galaxy Help forum:

https://help.galaxyproject.org/
Troubleshooting post: https://help.galaxyproject.org/t/troubleshooting-resources-for-errors-or-unexpected-results/42

GTN Tutorials:

https://training.galaxyproject.org/

Thanks! Jen

dram26 commented 3 years ago

Hi!

could you kindly add macaca fascicularis genome for BWA ? it's now at ncbi https://www.ncbi.nlm.nih.gov/genome/776

[There is a 2015 petition for the same here https://trello.com/c/mJWnAuuQ/1511-reference-genome-requests-for-http-usegalaxyorgby AmyK Feb 4, 2015 at 6:47 PM and another in 2016, but i guess the source wasn't validated then? ]

Best! David

galaxyproject / usegalaxy-playbook

Genome Additions Master Ticket #242

Genome and indexes for CVMFS and http://usegalaxy.org

Making a Reference genome request

Fasta

2bit

Sam

Picard

Bowtie2/Tophat2

BWA/BWA-MEM

HISAT2

Liftover

Fasta

2bit

Sam

Picard

Bowtie2/Tophat2

BWA/BWA-MEM

HISAT2

Liftover

RNA STAR

Liftover

RNA STAR