StellaHxy / EMNgly

4 stars 0 forks source link

N-GlyDE datasets #2

Open NancyFyong opened 3 months ago

NancyFyong commented 3 months ago

Hi, I would like to ask if there is a data link for N-GlyDE, because when I looked for the paper link you sent, it only extracted partial sequences and did not have the complete fasta sequence. Thank you in advance, Best, zhiyongfeng

StellaHxy commented 3 months ago

Hi, I would like to ask if there is a data link for N-GlyDE, because when I looked for the paper link you sent, it only extracted partial sequences and did not have the complete fasta sequence. Thank you in advance, Best, zhiyongfeng

Hi Zhiyong, Thank you for reaching out. You can obtain the complete sequences from the UniProt website. I'm not sure if there is a direct link available to access the complete sequences. Best regards, Xiaoyang

NancyFyong commented 3 months ago

Thank you for your reply. Can you provide the complete data set of N-GlyDE in your project? Then in your project, I would like to ask you that you have 2554 pieces of data in the test.csv folder, but in the paper Only 2473 entries were used as mentioned in the article and tested. Is this the result of using cd-hit to remove redundancy? Thank you in advance, Best, zhiyongfeng

StellaHxy commented 3 months ago

Thank you for your reply. Can you provide the complete data set of N-GlyDE in your project? Then in your project, I would like to ask you that you have 2554 pieces of data in the test.csv folder, but in the paper Only 2473 entries were used as mentioned in the article and tested. Is this the result of using cd-hit to remove redundancy? Thank you in advance, Best, zhiyongfeng

Hi Zhiyong, Thank you for your message. You can obtain the dataset of N-GlyDE from the website I provided in README.md . Regarding the test.csv dataset in my project, it indeed contains 2554 entries. However, some of these entries were not eligible for encoding due to various reasons. The actual number of entries used for encoding and testing is 2473, as mentioned in the paper. You can run the code directly, and the ineligible data will be automatically excluded. The test.csv do not require further processing with cd-hit to remove redundancy. Best regards, Xiaoyang

NancyFyong commented 3 months ago

Thank you for your reply. I would like to ask if you can provide the pdb file of your N-GlyDE test set so that you can better reproduce your results. Thank you in advance, Best, zhiyongfeng