Closed emmaco closed 2 years ago
@emmaco Great work on the pipelines! It is a huge improvement over our previous setup. Here are a few suggestions:
It might be useful to consolidate or better parametrise the nextflow configs used by the release and PDB mapping. Currently there are lots of parameters, many of them are shared across different configs, and it's easy to forget to update some of them and get some errors that could be hard to debug.
Examples: https://github.com/Rfam/rfam-production/blob/release/pdb_mapping/nextflow.config#L5 https://github.com/Rfam/rfam-production/blob/release/scripts/release/workflows/stage_rfam_live.nf#L3 https://github.com/Rfam/rfam-production/blob/release/scripts/release/workflows/nextflow.config
Given the upcoming migration to CODON, it is important to avoid hard-coding paths in the code as they will be different between CODON and the current cluster (and might be different yet again in a future cluster migration). To make the migrations easier, I suggest finding a way to move all /nfs/...
paths into config files.
There is some redundancy in clan competition steps in PDB mapping and in the release workflows:
https://github.com/Rfam/rfam-production/blob/release/scripts/release/workflows/clan_competition.nf#L17 https://github.com/Rfam/rfam-production/blob/release/pdb_mapping/pdb_mapping.nf#L94
If we ever find a bug in the clan competition step, a fix may not be propagated to the clan competition used in the PDB mapping step as the code is not reused.
I am fine with merging the pull request as is and fixing these issues while producing the next release or while migrating to Codon, as long as you keep track of these issues and they are added to the 14.8 GitHub project.
đź‘ŹThis is really nice work Emma! You created a solid foundation for future improvements! Once we make a couple more releases, the code will be refined and the process will be even better!
They’re great points, thanks Anton. I’m going to merge this and add the issues to the release 14.8 checklist.
Including the workflows for each part of the release process, as well as a combined pipeline that will run as many of the release steps that have been automated to date. Also includes update to the README doc.