PrositePatterns is a pattern matching protein database. Partial matches are not supported only exact matches. Therefore, because patterns only match or not the score (e-value) column of the InterProScan TSV is left blank. Pygenprop is not currently compatible with leaving the InterProScan column blank.
Errors
If an e value column contains no value, parsing fails because a blank e-value gets recorded as a np.nan and np.nan cannot be written to the Micromeda file' (SQLite) e-value column.
File "/Users/lee/Dropbox/RandD/Repositories/pygenprop/pygenprop/results.py", line 721, in connect_step_assignments_to_interproscan_matches
current_interproscan = unique_interproscan_dict[interpro_signature][protein_identifier][e_value]
KeyError: nan
Problem Description
PrositePatterns is a pattern matching protein database. Partial matches are not supported only exact matches. Therefore, because patterns only match or not the score (e-value) column of the InterProScan TSV is left blank. Pygenprop is not currently compatible with leaving the InterProScan column blank.
Errors
If an e value column contains no value, parsing fails because a blank e-value gets recorded as a
np.nan
andnp.nan
cannot be written to the Micromeda file' (SQLite) e-value column.Problem Solution
Temporary Solution
Python script to sanitize pro sites from InterProScan TSVs: https://gist.github.com/LeeBergstrand/d429041fa50698fec5a83ddb2a295ed0
Long Term Solution
TODO - Edit Pygenprop to sanitize TSVs internally.