Just finished my rescheduled meeting. The format we agreed is:
The existing species_releases folder will have per species folders in it. When you're ready to start downloading you'll move the existing ones into old_data, but leave nextflow there till you've rewritten code.
Inside the species folder structure will look like this:
Ensembl
GRCh38
Release 103
BED
FASTA
GTF
INDEXES
In the FASTA folder you will keep the original file names, but duplicate the genome file to a standard renaming scheme for nextflow. You'll add .genome or similar to it to indicate what this file is. You'll also download the cDNA fasta to this folder.
For indexes possibilities will be Bowtie, Bowtie2, Hisat2, STAR (both the version in the genomics/soft/bin and the nextflow version, folder names should indicate which version they were made with), Hi-CUP, 10X, PARSE. Not all will be made for all species - for all releases? Can your code be release specific?
It would be good to add the Human T2T assembly as an option as well.
Just finished my rescheduled meeting. The format we agreed is:
The existing species_releases folder will have per species folders in it. When you're ready to start downloading you'll move the existing ones into old_data, but leave nextflow there till you've rewritten code.
Inside the species folder structure will look like this:
Ensembl
In the FASTA folder you will keep the original file names, but duplicate the genome file to a standard renaming scheme for nextflow. You'll add .genome or similar to it to indicate what this file is. You'll also download the cDNA fasta to this folder.
For indexes possibilities will be Bowtie, Bowtie2, Hisat2, STAR (both the version in the genomics/soft/bin and the nextflow version, folder names should indicate which version they were made with), Hi-CUP, 10X, PARSE. Not all will be made for all species - for all releases? Can your code be release specific?
It would be good to add the Human T2T assembly as an option as well.