GoekeLab / m6anet

Detection of m6A from direct RNA-Seq data
https://m6anet.readthedocs.io/
MIT License
103 stars 19 forks source link

Error in m6anet-inference step #8

Closed bhargava-morampalli closed 2 years ago

bhargava-morampalli commented 3 years ago

Hi Chris,

Thanks for releasing the new model for m6anet. I have run the nanopolish, data-prep steps successfully but I am encountering error with the inference step. Here's the command I used and the error that I got.

Screenshot 2021-05-13 at 12 50 52 PM

Here are the contents of the m6anet data-prep output directory.

data.index  data.json  data.log  data.readcount  eventalign.index 

and a few lines at the beginning for each file. data.index

transcript_id,transcript_position,start,end
gnl|X|GEFLOABG_1,495,0,241
gnl|X|GEFLOABG_1,554,241,805
gnl|X|GEFLOABG_1,566,805,1074
gnl|X|GEFLOABG_1,609,1074,1460
gnl|X|GEFLOABG_1,641,1460,1702
gnl|X|GEFLOABG_1,794,1702,1935
gnl|X|GEFLOABG_1,929,1935,2312
gnl|X|GEFLOABG_1,1276,2312,2415
gnl|X|GEFLOABG_1,2798,2415,2768

data.json

{"gnl|X|GEFLOABG_1":{"495":{"AAAACCA":[[0.01184074074074074,2.4917777777777776,110.4,0.00564,3.395,103.8,0.00752,2.2332285714285716,83.8],[0.01029,3.492,106.7,0.0156,4.1739999999999995,106.2,0.008974782608695654,1.8518405797101452,85.8]]}}}
{"gnl|X|GEFLOABG_1":{"554":{"CGAACTT":[[0.003320000000000001,6.228,113.0,0.003519512195121951,2.2558536585365854,98.7,0.00548,1.915878787878788,94.2],[0.0046864,5.01176,111.9,0.005838703703703704,2.561796296296296,98.3,0.004930512820512821,1.81474358974359,93.4],[0.010001636363636365,4.834199999999999,116.8,0.00299,2.9989999999999997,96.2,0.010620000000000001,2.162,91.2],[0.008946176470588235,5.474882352941177,116.5,0.018920000000000003,3.4560000000000004,102.8,0.00465,1.288,92.7],[0.00498,3.87,112.3,0.00598,2.3705,100.1,0.00299,1.9419999999999997,93.7]]}}}
{"gnl|X|GEFLOABG_1":{"566":{"GGGACTC":[[0.00465,2.62,125.0,0.011222000000000001,2.3234,126.6,0.0070901960784313725,4.266843137254902,91.4],[0.00232,3.6639999999999993,118.4,0.023049523809523808,8.358790476190476,126.5,0.0033529999999999996,3.2972499999999996,91.1]]}}}
{"gnl|X|GEFLOABG_1":{"609":{"AAAACTT":[[0.0069680930232558155,2.397232558139535,110.0,0.006640000000000002,3.582,112.1,0.00465,3.0060000000000002,97.6],[0.01727975,3.17975,112.6,0.004538723404255319,2.678808510638298,111.9,0.005993611111111112,1.6218055555555557,90.4],[0.010791319148936171,2.3994510638297872,111.8,0.00465,2.253,108.8,0.0073202272727272725,2.005272727272727,95.2]]}}}
{"gnl|X|GEFLOABG_1":{"641":{"AAAACAT":[[0.01029,3.805,104.3,0.0166,4.933,96.9,0.004095833333333333,2.7229166666666664,86.7],[0.00797,1.8795,111.6,0.009260961538461539,2.988711538461539,101.2,0.0063100000000000005,3.5839999999999996,88.1]]}}}
{"gnl|X|GEFLOABG_1":{"794":{"AAAACTG":[[0.00996,5.541,111.1,0.007640000000000001,6.316,107.9,0.005605757575757576,2.3050909090909095,91.1],[0.00365,3.0589999999999993,109.3,0.00266,2.226,100.3,0.0037576000000000003,2.78356,93.4]]}}}
{"gnl|X|GEFLOABG_1":{"929":{"CGAACTG":[[0.005724387755102042,4.217897959183674,103.9,0.006640000000000002,4.3839999999999995,100.0,0.003927636363636363,2.333509090909091,94.9],[0.010001636363636365,5.474981818181819,116.8,0.00232,1.39,102.4,0.00797,1.676,97.8],[0.011672112676056338,7.767183098591549,118.6,0.00299,2.261,104.7,0.008669074074074076,2.5933148148148146,94.3]]}}}
{"gnl|X|GEFLOABG_1":{"1276":{"ATAACAT":[[0.00365,1.327,87.2,0.00299,1.148,91.1,0.00365,2.305,96.9]]}}}
{"gnl|X|GEFLOABG_1":{"2798":{"CTGACAT":[[0.016830176991150442,3.0475840707964603,107.3,0.00498,13.562000000000001,115.8,0.0234909649122807,2.8029298245614034,81.8],[0.00365,2.01,105.9,0.0093,8.011000000000001,113.2,0.0083,2.9219999999999997,78.3],[0.010674254545454544,3.1419200000000003,106.3,0.0049552,9.74708,109.6,0.007640000000000001,3.4,83.2]]}}}
{"gnl|X|GEFLOABG_1":{"2872":{"GTGACAC":[[0.006475897435897436,3.2191025641025637,98.5,0.006076666666666667,9.85124,115.7,0.0052285185185185195,2.837111111111111,81.5],[0.008934782608695653,4.1631847826086945,103.2,0.011549135514018692,5.914228971962616,110.3,0.005172222222222222,4.274511111111111,84.8]]}}}

data.log

gnl|X|GEFLOABG_1: Data preparation ... Done.

data.readcount

transcript_id,transcript_position,n_reads
gnl|X|GEFLOABG_1,495,2
gnl|X|GEFLOABG_1,554,5
gnl|X|GEFLOABG_1,566,2
gnl|X|GEFLOABG_1,609,3
gnl|X|GEFLOABG_1,641,2
gnl|X|GEFLOABG_1,794,2
gnl|X|GEFLOABG_1,929,3
gnl|X|GEFLOABG_1,1276,1
gnl|X|GEFLOABG_1,2798,3

eventalign.index

transcript_id,read_index,pos_start,pos_end
gnl|X|GEFLOABG_1,9,172,40517
gnl|X|GEFLOABG_1,17,40517,124637
gnl|X|GEFLOABG_1,4,124637,184631
gnl|X|GEFLOABG_1,15,184631,306199
gnl|X|GEFLOABG_1,12,306199,361953
gnl|X|GEFLOABG_1,27,361953,440434
gnl|X|GEFLOABG_1,5,440434,546348
gnl|X|GEFLOABG_1,16,546348,605586
gnl|X|GEFLOABG_1,21,605586,686048

Happy to provide more details if needed. Please let me know how to resolve this error.

chrishendra93 commented 3 years ago

hi @bhargava-morampalli , can I ask if you modify some of the codes because I think the script name should be m6anet-run_inference? Anyway it seems that you are passing None type to the os.path.join argument somewhere in the script, is it possible that the problem is because you use -i and -o instead of --input_dir and --output_dir, that's why the command line does not recognize the input and output directory argument?

bhargava-morampalli commented 3 years ago

Oh, sorry - I just followed the quick start thing. So, I should use run_inference instead of inference?

m6anet-inference -input_dir demo_data --out_dir demo_data ---n_processes 4

I have not touched any code. Also, for inference - --input_dir and --out_dir did not work - that's why I changed them to -i and -o. I will try with run_inference and let you know if it works.

bhargava-morampalli commented 3 years ago

Okay, I used the m6anet-run_inference and I got the data.result.csv.gz file. It's very small and only a few bytes. Is this normal? also, n_processes is for allocating the threads - is it correct?

chrishendra93 commented 3 years ago

Ah I see, it's weird that it can still execute m6anet-inference, perhaps I have forgotten to clear some files / cache, let me check that on my end. Apology for the typo in the documentation, I have just updated it so that people will not mistake the command. Thanks!

Anyway, you are right, --n_processes is to allocate the number of threads. Also, by default m6anet will require each position to have at least 20 reads. May I know the size of the data.readcount files (like how many rows and whether they seem to have a lot of positions with at least 20 reads)? Also, can you check if the entries inside data.result.csv.gz make sense?

bhargava-morampalli commented 3 years ago

That's great, thanks. I will check the data.readcount and also about the data in csv.gz. I will let you know the results.

chrishendra93 commented 3 years ago

oh right, please try out with --input_dir instead of -input_dir as stated in the documentation before, that was a typo that I have corrected in the documentation, again, sorry for this

chrishendra93 commented 3 years ago

hi @bhargava-morampalli , can I ask if you have managed to run this successfully? If you have, then I want to close this issue, otherwise please let me know of any problems you are facing with running m6anet