Open davmlaw opened 8 months ago
Would be good to automate uploading releases as this is pretty tedious, could do:
gh release create <tag> --title "<release title>" --notes "<release notes>"
gh release upload <tag> <path/to/your/files/*>
Made a script "generate_transcript_data/github_release_upload.sh" which makes a release easier
Looking at the bash scripts, a lot of the complexity is due to looping over URLs and dealing with RefSeq URLs having identical file names, eg:
"https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/105.20190906/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.gff.gz"
"https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/105.20201022/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.gff.gz"
"https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/105.20220307/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.gff.gz"
So it's not so easy to just download it and carry on. I think with SnakeMake we should just explicitly list everything out in YAML files, and use that config to run a pipeline common between everything
We could make urls a dictionary, and then have the "nice name" for it as a key. That would allow us to move code into config which would be a lot nicer
ok, I have started on this (in generate_transcript_data)
I wanted to run the code with different config files, but couldn't work out a way to do it. I think SnakeMake seems to only want 1 config file. I thus combined everything in "config/*.yaml" into "cdot_transcripts.yaml"
having an issue at the moment with ambiguous rules for downloading files
@tedil @holtgrewe - I've finished v1 of the SnakeMake pipeline - if you could check it out as it's the first one I ever wrote:
https://github.com/SACGF/cdot/blob/main/generate_transcript_data/Snakefile https://github.com/SACGF/cdot/blob/main/generate_transcript_data/cdot_transcripts.yaml
Happy to hear feedback / if I should have structured it a different way etc.
Great, thank you! I will have a look when I am back from vacation
Sure, no hurry, enjoy your time off
At the moment we have file existence tests instead of proper dependency management