iLearnPlus is the first machine-learning platform with both graphical- and web-based user interface that enables the construction of automated machine-learning pipelines for computational analysis and predictions using nucleic acid and protein sequences.
91
stars
33
forks
source link
Show a warning if special fasta headers format is violated #2
In a large dataset of automatically downloaded sequences there can be names including "|" symbol.
I concatenate class and train/test labels also automatically.
So, when I try to analyze this file, there are uninformative error messages like:
ValueError: could not convert string to float: 'P42577.2'
ValueError: invalid literal for int() with base 10: '6LPD'
which are caused by incorrect fasta headers:
P42577.2_sp|P42577.2|FRIS_LYMST|0|training
6LPD_pdb|6LPD|F|1|training
A simple check when importing the file could show a warning to the user.
In a large dataset of automatically downloaded sequences there can be names including "|" symbol. I concatenate class and train/test labels also automatically. So, when I try to analyze this file, there are uninformative error messages like:
which are caused by incorrect fasta headers:
A simple check when importing the file could show a warning to the user.