Closed kalvari closed 3 years ago
@AntonPetrov This should be scheduled for release 14.6
The following rfam_id
correspond to the families to fix, along with the number of associated accessions:
Query to fetch miRNA families with >1 accessions:
select rfam_id, count(rfam_acc) as test from family
where type like '%miRNA%'
group by rfam_id
having test > 1;
Note an abnormal subfolder mir-278/
in the following SVN directory:
https://xfamsvn.ebi.ac.uk/svn/data_repos/trunk/Families/RF00729/
As indicated by Ioanna, this problem happens when a new family is committed with an existing ID. The pipeline adds the family to RfamLive but crashes while committing to the SVN so the family is only in the database (and on the website) but not in the SVN (so there is no CM). This is how we get š» families.
I identified all the affected families using select rfam_id from family group by rfam_id having count(*) > 1;
and went through them one by one.
I deleted all families that were only in database and not in the SVN using a new jiffy script kill_family_in_db_no_svn.pl which is essentially rfkill.pl
that does not use any of the SVN perl classes. These IDs are in the dead_family
table now.
I re-created the families using unique IDs where necessary and made sure that they were added both to the SVN and the database.
Once @nawrockie adds a QC that ensures that rfnew.pl will reject DESC files if an ID is already in the database, this problem should not happen again. š¤
A new QC has been added ā
There is some inconsistency between the families existing in
rfam_live
database and theSVN
repo, which would need to be resolved.The corresponding Rfam family accessions are listed below:
These accessions belong to miRNA families, which were recently updated from miRBase.
For example:
RF03551
represents miRNAmir-506
along with accessionsRF03529
andRF01910
the latter of which is the old/initial accession assigned to this family. Attempting to re-commit families with ids already existing in Rfam, which create a new entry inrfam_live
, but SVN commit will fail resulting in "ghost families".All associated entries can be found by executing the following query:
This query will return the following 3 accessions for
mir-506
:Solution:
rfam_live
rfco.pl
(e.g. RF01910)rfsearch.pl
followed byrfmake.pl
(for thresholds see the relevant report)add_ref.pl
rqc-all.pl
rfci.pl
rfam_live
(e.g. RF03529, RF03551)Note:
rfkill.pl
does not work in this case because there are no entities in the SVN repository for accessions RF03529, RF03551. Hence the term "ghost families".