Hayfabm / AMP-app

deepchain.bio Antimicrobial peptide recognition
0 stars 0 forks source link

Key error from pandas #1

Open Darcy220606 opened 2 years ago

Darcy220606 commented 2 years ago

HI Hayfa,

Thanks a lot for setting up this CLI version of the amp scanner. I was trying to set it up locally but i keep getting a Key error 'sequences' . It seems that this error is linked also to the utils.py script. Im testing the tool using the training datasets you have available online and a custom test dataset by running $python AMP.py [input.fa] . Any idea how i can resolve this issue?

Many thanks for your help!!

Traceback (most recent call last): File "/conda_envs/amp-app/lib/python3.7/site-packages/pandas/core/indexes base.py", line 3361, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError ‘sequences'

The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/AMP-app/AMP.py", line 141, in sequences, labels=np.array(create_dataset(data_path=DATASET)) File "/AMP-app/utils.py", line 11, in create_dataset return list(dataset["sequences"]), list(dataset["label"]) File "/conda_envs/amp-app/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in getitemindexer = self.columns.get_loc(key) File "/conda_envs/amp-app/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc raise KeyError(key) from err KeyError'sequences'

Hayfabm commented 2 years ago

Thank you for considering this application. To solve this issue can you check utils.py script and be sure that create_dataset function returns return list(dataset["sequence"]), list(dataset["label"]) NOT return list(dataset["sequences"]), list(dataset["label"])

def create_dataset(data_path: str) -> Tuple[List[str], List[int]]:
    dataset = pd.read_csv(data_path)
    dataset = dataset.sample(frac=1).reset_index(drop=True)  # shuffle the dataset
    return list(dataset["sequence"]), list(dataset["label"])
Darcy220606 commented 2 years ago

Thanks for your quick reply. I changed it accordingly but i still get a similar error: Keyerror : 'sequence'

Traceback (most recent call last): File "/conda_envs/amp-app/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError:'sequence' The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/AMP-app/AMP.py", line 141, in sequences, labels= np.array(create_dataset(data_path=DATASET)) File "/AMP-app/utils.py", line 11, in create_dataset return list(dataset["sequence"]), list(dataset["label"]) File "/conda_envs/amp-app/lib/python3.7/site-packages/pandas/core/frame.py", line 3455, in getitem indexer = self.columns.get_loc(key) File "/conda_envs/amp-app/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc raise KeyError(key) from err KeyError: 'sequence'

Hayfabm commented 2 years ago

No worries, it was an issue related to Data. I just uploaded the correct data all_Data.csv. Enjoy!!!

Darcy220606 commented 2 years ago

Hi @Hayfabm

Thanks a lot for following up on the error and updating the data and script files. Regarding the output, there doesnt seem to be the sequence identifier before the prediction results in the table ' Antimicrobial recognition.txt'. Is that the case and where can i find them?

Many thanks once again.


acc, sensitivity, specificity, mcc, Roc_auc, Roc_pr

0.8765432098765432,0.8415841542495834,0.9113300447717732,0.7548495571138049,0.9423986733648735,0.9540515969171111 0.908641975308642,0.8571428529204785,0.9603960348495246,0.8217581174033954,0.9651758279276205,0.9714525215491914 0.9108910891089109,0.8613861343495736,0.9603960348495246,0.8258399856982515,0.9537545338692285,0.9617516423688823 0.9133663366336634,0.8910891044995589,0.9356435597245368,0.8275544702694583,0.9674296637584551,0.9682923557732053 0.8935643564356436,0.8663366293745711,0.9207920746495442,0.7882983889008571,0.9483874129987256,0.9572226613364542 0.9158415841584159,0.9059405895745516,0.9257425696745417,0.8318462754108389,0.9734094696598373,0.9760802374115075 0.8960396039603961,0.8514851442995786,0.9405940547495344,0.7952427724679769,0.957063033035977,0.9632577523953173 0.8861386138613861,0.8366336592245859,0.9356435597245368,0.7760905889694412,0.9629693167336535,0.966777086469228 0.9381188118811881,0.9158415796245466,0.9603960348495246,0.8771086301658331,0.9748799137339477,0.9784152592072608 0.8910891089108911,0.900990094549554,0.8811881144495638,0.7823316161601936,0.9611067542397804,0.9657823404985096 acc=90.30% (+/- 1.69%) sensitivity=87.28% (+/- 2.69%) specificity=93.32% (+/- 2.38%) mcc=80.81% (+/- 3.36%) roc_auc=96.07% (+/- 0.99%) roc_pr=96.63% (+/- 0.73%)