We need an automated way to validate FTP files before a release. Based on previous experience, I propose the following checks:
The number of FASTA files must match the expected number of families
No empty FASTA files
No .log files in the fasta_files folder
The clanin file is not empty and the number of lines is not smaller than in the previously released clanin file (it is possible that some clans may get smaller but this is rare; it is more likely that fewer lines indicates a problem)
The Rfam.cm file contains the expected number of families
The Rfam.cm file contains ACC and DESC fields
All database_files exist and are non-empty, and for each table there is an .sql and a .txt.gz file
Genome browser folder exists and is non-empty (until #21 is done, this level of checking is enough)
rfam2go and md5 files exist and are non-empty
the headed of the rfam2go file contains the correct release number
Rfam.seed_tree.tar.gz contains the expected number of .seed_tree files
We need an automated way to validate FTP files before a release. Based on previous experience, I propose the following checks:
.log
files in thefasta_files
folderACC
andDESC
fields.sql
and a.txt.gz
file.seed_tree
filesNOTABLE CHANGES IN RECENT RELEASES