Open jdzm opened 3 years ago
The dbnsfp33a will be used in the PM1, dbnsfp31a_interpro is acutally not used. Will remove this dataset in future version. For the delimiter, they will affect some of the domains, thanks point this and already updated these PM1 datasets.
Thanks a lot for the update! Will dbnsfp41a be available at some point? It would be really helpful to be able to use that version with Annovar.
It is available in ANNOVAR. you may need to change config file in intervar to use dbnsfp version 4.
On Tue, Jan 26, 2021 at 2:57 AM jdzm notifications@github.com wrote:
Thanks a lot for the update! Will dbnsfp41a be available at some point? It would be really helpful to be able to use that version with Annovar.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGLab/InterVar/issues/54#issuecomment-767369842, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OBGF24CE6MYHVZAXSTS3ZYVTANCNFSM4WR2DRZQ .
Dear Kai, thank you for the suggestion. I have already tried but the names of the databases (for the Annovar subprocess) are hardcoded in the Intervar.py
script (inside check_annovar_result()
in line 535). I am afraid that changing hardcoded variables will affect the ACGM classification procedure.
The only thing that changes when I add any database to the database_names
variable in the config file, is that the databases will be installed by the function check_downdb()
but not used in the actual Annovar run.
Hi!
I am facing the same issue as jdzm, so I am wondering; what is the point of setting the database_names parameter in config.ini if it is actually not used by the annotation procedure?
Is it safe to manually update the version of the hardcoded databases in the check_annovar_result() function?
Thank you for implementing the InterVar software!
Regards, Luis.
Hi @lubertorubior, my feeling is that in general you should not modify those databases (even though some may be outdated) because the calculation of ACMG scores depends on the headers on the multianno
table generated by Annovar and these, at the same time, can vary between versions of a same database.
This being said, I think it is relatively safe to update databases like clinvar_20190305
to clinvar_20200316
, always having checked first that the field definition and column headers have not changed between versions. Bigger databases like dbnsfp33a
are a different problem. In this particular case, it is very sensitive to update to dbnsfp41c
because some fields change from integer values to text values (like is the case of "SIFT_score"
and "SIFT_pred"
). This messes up with the assignment of scores such as PP3 and BP4, so it is trickier to update because you have to be extra careful, check each database and check how each score is computed, then probably add some lines of code to work around the changes in the database.
Hope this is helpful, Juan.
yes, the column headers sometime changed when different database version, that's why the database column name were hard coded, and not suggest people directy replace the database names if they are not familar with the details of the databases.
Hi,
I have been running InterVar to annotate my variants and I have realized that in the
annovar
subprocess, dbnsfp33a and dbnsfp31a_interpro are used, both of which add one column namedInterpro_domain
to themultianno.txt
file. I checked the formatting of these two fields to see if they were matching and I found that they use a different delimiter: ,|
instead of;
(for example,AAA+ ATPase domain|ABC transporter-like
vsAAA+ ATPase domain;ABC transporter-like
).I guess that Intervar considers only the first occurrence of the
Interpro_domain
column, therefore possibly ignoring dbnsfp31a_interpro. How is the matching of these fields done? Could this affect the annotation done bycheck_PM1()
when comparing to thePM1_domains_with_benigns
intervar database? Does this mean thatdbnsfp31a_interpro
is ignored?Thanks for your time, Best, Juan.