Several Interpro_domain columns

WGLab / InterVar

A bioinformatics software tool for clinical interpretation of genetic variants by the 2015 ACMG-AMP guideline

187 stars 93 forks source link

Several Interpro_domain columns #54

Open jdzm opened 3 years ago

jdzm commented 3 years ago

Hi,

I have been running InterVar to annotate my variants and I have realized that in the annovar subprocess, dbnsfp33a and dbnsfp31a_interpro are used, both of which add one column named Interpro_domain to the multianno.txt file. I checked the formatting of these two fields to see if they were matching and I found that they use a different delimiter: , | instead of ; (for example, AAA+ ATPase domain|ABC transporter-like vs AAA+ ATPase domain;ABC transporter-like).

I guess that Intervar considers only the first occurrence of the Interpro_domain column, therefore possibly ignoring dbnsfp31a_interpro. How is the matching of these fields done? Could this affect the annotation done by check_PM1() when comparing to the PM1_domains_with_benigns intervar database? Does this mean that dbnsfp31a_interpro is ignored?

Thanks for your time, Best, Juan.

quanliustc commented 3 years ago

The dbnsfp33a will be used in the PM1, dbnsfp31a_interpro is acutally not used. Will remove this dataset in future version. For the delimiter, they will affect some of the domains, thanks point this and already updated these PM1 datasets.

jdzm commented 3 years ago

Thanks a lot for the update! Will dbnsfp41a be available at some point? It would be really helpful to be able to use that version with Annovar.

kaichop commented 3 years ago

It is available in ANNOVAR. you may need to change config file in intervar to use dbnsfp version 4.

On Tue, Jan 26, 2021 at 2:57 AM jdzm notifications@github.com wrote:

Thanks a lot for the update! Will dbnsfp41a be available at some point? It would be really helpful to be able to use that version with Annovar.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGLab/InterVar/issues/54#issuecomment-767369842, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OBGF24CE6MYHVZAXSTS3ZYVTANCNFSM4WR2DRZQ .

jdzm commented 3 years ago

Dear Kai, thank you for the suggestion. I have already tried but the names of the databases (for the Annovar subprocess) are hardcoded in the Intervar.py script (inside check_annovar_result() in line 535). I am afraid that changing hardcoded variables will affect the ACGM classification procedure.

The only thing that changes when I add any database to the database_names variable in the config file, is that the databases will be installed by the function check_downdb() but not used in the actual Annovar run.

lubertorubior commented 3 years ago

Hi!

I am facing the same issue as jdzm, so I am wondering; what is the point of setting the database_names parameter in config.ini if it is actually not used by the annotation procedure?

Is it safe to manually update the version of the hardcoded databases in the check_annovar_result() function?

Thank you for implementing the InterVar software!

Regards, Luis.

jdzm commented 3 years ago

Hi @lubertorubior, my feeling is that in general you should not modify those databases (even though some may be outdated) because the calculation of ACMG scores depends on the headers on the multianno table generated by Annovar and these, at the same time, can vary between versions of a same database.

This being said, I think it is relatively safe to update databases like clinvar_20190305 to clinvar_20200316, always having checked first that the field definition and column headers have not changed between versions. Bigger databases like dbnsfp33a are a different problem. In this particular case, it is very sensitive to update to dbnsfp41c because some fields change from integer values to text values (like is the case of "SIFT_score" and "SIFT_pred"). This messes up with the assignment of scores such as PP3 and BP4, so it is trickier to update because you have to be extra careful, check each database and check how each score is computed, then probably add some lines of code to work around the changes in the database.

Hope this is helpful, Juan.

quanliustc commented 3 years ago

yes, the column headers sometime changed when different database version, that's why the database column name were hard coded, and not suggest people directy replace the database names if they are not familar with the details of the databases.