Rfam / rfam-production

Rfam production pipeline
Apache License 2.0
5 stars 3 forks source link

Do not remove `GR` lines from SEED alignments with 3D structure annotations #115

Closed AntonPetrov closed 1 year ago

AntonPetrov commented 2 years ago

For example, the RF00008 alignment in SVN includes a GR line like this:

#=GR URS000080DE02_32630/1-69 2QUS_A_SS (((((((((((..((((((.....{.))))))(....).((((...............................}))))...))))))))))).....

This line is manually curated and very important so it should be included in the official Rfam files. However, the alignments on the FTP and the website do not have this line, something must be stripping out the GR lines.

It seems like some modifications are needed at the Generate annotated files step of the release: https://github.com/Rfam/rfam-production/tree/master/docs/release#generate-annotated-files

It's possible that GR lines could be removed for a reason, so I would check for the presence of GR lines at every step of the annotated_files.nf pipeline.

This should be done before the Rfam 15 paper is submitted so that the users can see these annotations.

emmaco commented 1 year ago

GR lines were not written to file in the writeAnnotatedSeed.pl jiffy, which is called when we write the seed files for the release process. This has been updated. https://github.com/Rfam/rfam-family-pipeline/pull/89