NAL-i5K / Organism_Onboarding

A workflow to make organism onboarding pipeline easy to handle as an I/O pipeline
4 stars 1 forks source link

Merge localdata #135

Closed ZhiXuanLai closed 2 years ago

ZhiXuanLai commented 2 years ago

Considering if the conditional for MD5 download is required, or we can simplify the code and always download the MD5 file

mpoelchau commented 2 years ago

@ZhiXuanLai this looks pretty good! I made a few edits to the .yml file and pushed them. My only other comment is that in the gene prediction readme, the protein fasta file has quotation marks and brackets around it - perhaps this is due to removing the array/scatter for this?

The following files were retrieved from NCBI on 2022-12-4:
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/298/625/GCF_001298625.1_SEUB3.0/GCF_001298625.1_SEUB3.0_genomic.gff.gz
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/298/625/GCF_001298625.1_SEUB3.0/GCF_001298625.1_SEUB3.0_cds_from_genomic.fna.gz
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/298/625/GCF_001298625.1_SEUB3.0/GCF_001298625.1_SEUB3.0_rna_from_genomic.fna.gz
["https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/298/625/GCF_001298625.1_SEUB3.0/GCF_001298625.1_SEUB3.0_translated_cds.faa.gz"]