Closed qzhai closed 4 weeks ago
Hello, the transformer structure for these two datasets are the same, so you only need to change the parameter about the input length (data length). I am afraid that nowadays I cannot release the Wheat3k, where the reason is shown in Readme.
{
"width": 32,
"num_layers": 3,
"num_heads": 4,
"hidden_dims": [512, 128],
"dropout": 0.0,
"filter_dim": 201740,
"method": "MLM",
"tokenizer": "kmer_nonoverlap",
"max_len": 230000,
"kmer": 5,
"class_num": 1
}
Note that you only need to set up the max_len
, which is used in positional embedding. Other parameters does not need to be changed.
I appreciate your kind response. I'll try the code with the updated parameters. In dataset_rise3k_45.py, lines 54-57 contain excess indentations, and data loaded from xlsx_path also needs to be estimated mean/std, otherwise there will be a bug while parsing getitem in GENE.
Oh, you are right. The reason for the error may be during replication the code. Also thanks for your kind advice.
Dear,
I tried to test the released Rice3k dataset, but the configuration file of Rice3k is missing, while only the wheat file exists. Could you please help to update the Rice3k param file? btw, would the Wheat3d be released?
Best