Closed DayeaPark closed 2 months ago
Hi @DayeaPark, Both the min_reads variable in the training config file and the n_reads_per_site variable in the model config should be changed. Sorry for the potential confusion in the documentation. Let us know if that solves your issue.
Thanks for the response, @kristinrma. I am using conda environment to install m6Anet. Could you let me know where I can find the both training config and model config? I created model config and provide the file when I run m6Anet inference, however I cannot find where I can change the training config.
I guess I can make training model again providing training config to run m6Anet-train. However I have problem in the step when I run this. In training config, I need to edit root_dir and nor_path. Since I am using your preset data, I used your norm_path data (norm_factors_hct116.joblib) but I cannot find the labelled file for root_dir. In your description. I need to provide the root_dir which includes data.info.labelled file and data.json file. Could you let me know where I can find the those files?
I just want to use your pre-trained model with n_read threshold as 1. If you have any idea to solve this problem without re-training model, please let me know. Thank you.
My training config looks like this. [loss_function] loss_function_type = "binary_cross_entropy_loss"
[dataset] root_dir = "/path/to/m6anet/m6anet/tests/data/" min_reads = 10 norm_path = "/path/to/m6anet/m6anet/model/norm_factors/norm_factors_hct116.joblib" num_neighboring_features = 1
[dataloader] [dataloader.train] batch_size = 256 sampler = "ImbalanceOverSampler"
[dataloader.val] batch_size = 256 shuffle = false
[dataloader.test] batch_size = 256 shuffle = false
Thanks for your help!
Hi @DayeaPark,
Glad you were able to find the sample training config and modify it. To replicate the pre-trained model with the minimum number of reads at 10, I would suggest running m6Anet dataprep using the SGNex Hct116 Rep2 Run1 dataset as this was the original dataset used to train m6Anet. A tutorial on how to retrieve files from the SGNex AWS S3 bucket can be found here https://github.com/GoekeLab/sg-nex-data/blob/master/docs/AWS_data_access_tutorial.md. After you generate the data prep files, you can follow the m6Anet training documentation to create data.info.labelled from your data.info set; then set root_dir to your dataprep folder. Hope this helps.
Hi.
I currently run m6Anet with pre-trained model (Hct116_RNA002). I wonder if there is any way that I can change the n_read_per_site from 20 to 10. I changed n_read_per_site in the model toml file (prod_pooling.toml) but has error. You any help to make modifition on read threshold will be helpful for me. Thank you.
ValueError: Length of values (86428) does not match length of index (43214)
my modified toml file looks like this.
model = "prod_sigmoid_pooling"
[[block]] block_type = "DeaggregateNanopolish" num_neighboring_features = 1
[[block]] block_type = "KmerMultipleEmbedding" input_channel = 66 output_channel = 2 num_neighboring_features = 1
[[block]] block_type = "ConcatenateFeatures"
[[block]] block_type = "Linear" input_channel = 15 output_channel = 150 activation = "relu" batch_norm = true
[[block]] block_type = "Linear" input_channel = 150 output_channel = 32 activation = "relu" batch_norm = false
[[block]] block_type = "SigmoidProdPooling" input_channel = 32 n_reads_per_site = 10