RenqiChen / Genomic-Selection

An Embarrassingly Simple Approach to Enhance Transformer Performance in Genomic Selection for Crop Breeding
Apache License 2.0
9 stars 1 forks source link

missing ./configs/gwas_transformer_rice3k.json #4

Closed qzhai closed 4 weeks ago

qzhai commented 4 weeks ago

Dear,

I tried to test the released Rice3k dataset, but the configuration file of Rice3k is missing, while only the wheat file exists. Could you please help to update the Rice3k param file? btw, would the Wheat3d be released?

Best

RenqiChen commented 4 weeks ago

Hello, the transformer structure for these two datasets are the same, so you only need to change the parameter about the input length (data length). I am afraid that nowadays I cannot release the Wheat3k, where the reason is shown in Readme.

RenqiChen commented 4 weeks ago
{
    "width": 32, 
    "num_layers": 3,
    "num_heads": 4, 

    "hidden_dims": [512, 128],
    "dropout": 0.0,
    "filter_dim": 201740,
    "method": "MLM",
    "tokenizer": "kmer_nonoverlap",
    "max_len": 230000,
    "kmer": 5,
    "class_num": 1
}

Note that you only need to set up the max_len, which is used in positional embedding. Other parameters does not need to be changed.

qzhai commented 4 weeks ago

I appreciate your kind response. I'll try the code with the updated parameters. In dataset_rise3k_45.py, lines 54-57 contain excess indentations, and data loaded from xlsx_path also needs to be estimated mean/std, otherwise there will be a bug while parsing getitem in GENE.

RenqiChen commented 4 weeks ago

Oh, you are right. The reason for the error may be during replication the code. Also thanks for your kind advice.