Worldseer commented 1 year ago

I left lines 152-162 of the code in evaluate_deepgoplus.py uncommented and added the following export file command. The DeepGOPlus_1_all.txt file was generated using the provided predictions.pkl file from CAFA3. Afterwards it was evaluated using the official evaluation tool and the results obtained were not the same as in the paper. I assume I can't export the txt file directly, so where do I need to modify to and get the same results as in the paper?

  print("exporting predictions to CAFA submission format")
  txt_out=[]

  txt_out.append('AUTHOR None\n')
  txt_out.append('MODEL 1\n')
  txt_out.append('KEYWORDS natural language processing.\n')
  for i, row in enumerate(test_df.itertuples()):
      prot_id = row.proteins
      for go_id, score in deep_preds[i].items():
          #print(f'{prot_id}\t{go_id}\t{score:.2f}')
          score_str = "{0:.2f}".format(score)
          if(score_str!="0.00"):
              txt_out.append(str(prot_id)+"\t"+str(go_id)+"\t"+score_str+"\n")
  txt_out.append('END')
  with open(filename, 'w') as f:
      f.writelines(txt_out)

coolmaksat commented 1 year ago

Hi, You need to use this tool to get the same results as in the paper for CAFA3 evaluation https://github.com/ashleyzhou972/CAFA_assessment_tool

If I remember correctly this file should have the txt files deepgoplus-cafa.tar.gz

Worldseer commented 1 year ago

Thank you for your reply. I have tested the txt file you provided and it is consistent with the results in the article. I would also like to evaluate my model, so I would like to ask you how you generated this txt file, just the exact details of generating the txt file from predict.pkl. It would be great if you have the corresponding coding to provide me. You can send it to me via email at quupeng@163.com. Once again, my thanks!

coolmaksat commented 1 year ago

I think I have used evaluate_cafa3.py script to generate them. Uncomment lines 124-132

Worldseer commented 1 year ago

Thank you for taking the time to reply to me, I will try it again. The previous try didn't turn out right, as I said the first time I asked you

Worldseer commented 1 year ago

Hello, Dr. Kulmanov， I am back, I have tested several times. The above is tested with the txt file generated by data-cafa3/predictions.pkl, the bottom is tested with the txt provided directly by you. %Results of the txt file generated using the predictions.pkl file you provided

%Results of the txt file generated using the predictions.pkl file you provided 
Ontology    Type    Mode     | Fmax Threshold   Coverage
bpo NK  partial  | 0.3864840946855839   0.18    1.0
bpo NK  full     | 0.3864840946855839   0.18    1.0
bpo LK  partial  | 0.4072675565869404   0.17    1.0
bpo LK  full     | 0.4072675565869404   0.17    1.0
cco NK  partial  | 0.6106824202178297   0.25    1.0
cco NK  full     | 0.6106824202178297   0.25    1.0
cco LK  partial  | 0.6000260772486744   0.22    1.0
cco LK  full     | 0.6000260772486744   0.22    1.0
mfo NK  partial  | 0.5549217805948832   0.12    1.0
mfo NK  full     | 0.5549217805948832   0.12    1.0
mfo LK  partial  | 0.5376631411272229   0.11    1.0
mfo LK  full     | 0.5376631411272229   0.11    1.0

%Results on CAFA_assessment_tool using the txt file you provided
Species:all
Ontology    Type    Mode     | Fmax Threshold   Coverage
bpo NK  partial  | 0.3899217629352804   0.18    1.0
bpo NK  full     | 0.3899217629352804   0.18    1.0
bpo LK  partial  | 0.4098261625978569   0.14    1.0
bpo LK  full     | 0.4098261625978569   0.14    1.0
cco NK  partial  | 0.6126849280111355   0.25    1.0
cco NK  full     | 0.6126849280111355   0.25    1.0
cco LK  partial  | 0.5963076619060491   0.23    1.0
cco LK  full     | 0.5963076619060491   0.23    1.0
mfo NK  partial  | 0.5561658907106347   0.12    1.0
mfo NK  full     | 0.5561658907106347   0.12    1.0
mfo LK  partial  | 0.5217672872688168   0.11    1.0
mfo LK  full     | 0.5217672872688168   0.11    1.0

The thresholds on NK are the same for both, but the values of Fmax are somewhat different. Why is it so different? This question has been bothering me for weeks, or maybe you can illuminate my mind.

coolmaksat commented 1 year ago

Hi, I need to rerun everything and check before I can give an answer for this. One possible reason might be the ontology version, or if you re-train the model you might get some other results. We've been updating the model several times, maybe we changed something there, I'm not really sure. But, I think the results are not significantly different and are almost the same as in the paper. Checkout our latest model called DeepGOZero

Worldseer commented 1 year ago

Thank you very much for your reply, the version of the ontology I used is go_cafa3.obo. I just took DeepGOPlus's predictions.pkl on cafa and verified it without retraining the model.I look forward to hearing from you again after you check. I will read your latest paper carefully and I believe it will give me a lot of ideas. Thanks again.

simon19891216 commented 1 year ago

I think I have used evaluate_cafa3.py script to generate them. Uncomment lines 124-132

when I used evaluate_cafa3.py, one of input files is predictions.pkl, how to generate this file? I want to evaluate our annotation results

simon19891216 commented 1 year ago

furthermore, since the related functions have been developed, whether the evaluate_cafa3 can be added into your website?

Worldseer commented 1 year ago

hello simon，You can refer to https://github.com/bio-ontology-research-group/deepgoplus/blob/master/deepgoplus.py to generate predictions.pkl. line 186

simon19891216 commented 1 year ago

Thank you. I will try it.

At 2023-04-10 11:34:45, "pencorn" @.***> wrote:

hello simon，You can refer to https://github.com/bio-ontology-research-group/deepgoplus/blob/master/deepgoplus.py to generate predictions.pkl. line 186

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

song2000012138 commented 11 months ago

Hello, Dr. Kulmanov， I am back, I have tested several times. The above is tested with the txt file generated by data-cafa3/predictions.pkl, the bottom is tested with the txt provided directly by you. %Results of the txt file generated using the predictions.pkl file you provided

%Results of the txt file generated using the predictions.pkl file you provided 
Ontology  Type    Mode     | Fmax Threshold   Coverage
bpo   NK  partial  | 0.3864840946855839   0.18    1.0
bpo   NK  full     | 0.3864840946855839   0.18    1.0
bpo   LK  partial  | 0.4072675565869404   0.17    1.0
bpo   LK  full     | 0.4072675565869404   0.17    1.0
cco   NK  partial  | 0.6106824202178297   0.25    1.0
cco   NK  full     | 0.6106824202178297   0.25    1.0
cco   LK  partial  | 0.6000260772486744   0.22    1.0
cco   LK  full     | 0.6000260772486744   0.22    1.0
mfo   NK  partial  | 0.5549217805948832   0.12    1.0
mfo   NK  full     | 0.5549217805948832   0.12    1.0
mfo   LK  partial  | 0.5376631411272229   0.11    1.0
mfo   LK  full     | 0.5376631411272229   0.11    1.0

%Results on CAFA_assessment_tool using the txt file you provided
Species:all
Ontology  Type    Mode     | Fmax Threshold   Coverage
bpo   NK  partial  | 0.3899217629352804   0.18    1.0
bpo   NK  full     | 0.3899217629352804   0.18    1.0
bpo   LK  partial  | 0.4098261625978569   0.14    1.0
bpo   LK  full     | 0.4098261625978569   0.14    1.0
cco   NK  partial  | 0.6126849280111355   0.25    1.0
cco   NK  full     | 0.6126849280111355   0.25    1.0
cco   LK  partial  | 0.5963076619060491   0.23    1.0
cco   LK  full     | 0.5963076619060491   0.23    1.0
mfo   NK  partial  | 0.5561658907106347   0.12    1.0
mfo   NK  full     | 0.5561658907106347   0.12    1.0
mfo   LK  partial  | 0.5217672872688168   0.11    1.0
mfo   LK  full     | 0.5217672872688168   0.11    1.0

The thresholds on NK are the same for both, but the values of Fmax are somewhat different. Why is it so different? This question has been bothering me for weeks, or maybe you can illuminate my mind.

How can I get the txt files from the prediction.pkl file ,I uncommented the 124-132 ,but there's no file generated ,I changed lines 124-132 to print("exporting predictions to CAFA submission format") txt_out=[]

txt_out.append('AUTHOR None\n') txt_out.append('MODEL 1\n') txt_out.append('KEYWORDS natural language processing.\n') for i, row in enumerate(test_df.itertuples()): prot_id = row.proteins for go_id, score in deep_preds[i].items():

print(f'{prot_id}\t{go_id}\t{score:.2f}')

      score_str = "{0:.2f}".format(score)
      if(score_str!="0.00"):
          txt_out.append(str(prot_id)+"\t"+str(go_id)+"\t"+score_str+"\n")

txt_out.append('END') with open(filename, 'w') as f: f.writelines(txt_out). But the files generated each time are the same ， how can I get the four files(mf bp cc and the all.txt files).Can you give me some guidance

coolmaksat commented 11 months ago

Hello, evaluate_cafa3.py script has a parameter for sub ontology selection '--ont', you need to run it three times with 'mf', 'bp' and 'cc' parameters. To get 'all' file just concatenate the three files.

song2000012138 commented 11 months ago

Hello, evaluate_cafa3.py script has a parameter for sub ontology selection '--ont', you need to run it three times with 'mf', 'bp' and 'cc' parameters. To get 'all' file just concatenate the three files.

Ok,I'll try it,Thank you so much.

bio-ontology-research-group / deepgoplus

Evaluate the predictions.pkl provided on CAFA3 with CAFA_assessment_tool #47

print(f'{prot_id}\t{go_id}\t{score:.2f}')