IndigoFloyd / SoyDNGPNext

A deeplearning driving bioinformatic toolkit.
10 stars 3 forks source link

No data #1

Closed chl2 closed 4 months ago

chl2 commented 7 months ago

Thank you very much for your open code, but I didn't find the data in the http://xtlab.hzau.edu.cn/downloads/, we want to use the same data to compare, is it convenient for you to send me, please? image

IndigoFloyd commented 7 months ago

Sorry for that but what kind of data do you want? I'm not sure if you mean the weight of models or something else. Please tell me in detail so I could fix this.

chl2 commented 7 months ago

不知道能否共享您处理后的vcf文件和蛋白质、POD的表型文件呢?我尝试了从您论文中提到的2个开源网站下载数据集并进行复现,但是蛋白质的皮尔逊相关系数只能到0.2,在40代之后就会出现nan值,POD的分类精度可到85%。论文中提到的intersection operation是如何操作的呢?a selection process是按照什么确定不是mixed accessions呢?

IndigoFloyd commented 7 months ago
  1. intersection operation指:USDA的数据集和我们的测试集/验证集的位点取交集,没有其它操作;
  2. mixed accessions:在原始的2万多份样本中去除及性状缺失的大豆,类别在附件的excel表subcollection列可以筛选。 4_sel_max_soja_information.xlsx
  3. 预测效果不好的问题,可能是由于数据预处理等多种因素导致的。建议您再测试几次,如果还是有问题再继续交流。