GoekeLab / m6anet

Detection of m6A from direct RNA-Seq data
https://m6anet.readthedocs.io/
MIT License
103 stars 19 forks source link

Few m6a sites detected #35

Closed MiphaZ closed 2 years ago

MiphaZ commented 2 years ago

Hi, I downloaded HEK293T-WT data which includes a eventalign result from xPore and it's preparation is as same as that in your article. I performed m6anet-dataprep and got this warning:

/home/yuan/.local/lib/python3.9/site-packages/m6anet/scripts/dataprep.py:142: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy chunk_split['line_length'] = np.array(lines) /home/yuan/.local/lib/python3.9/site-packages/m6anet/scripts/dataprep.py:100: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum()

Then I tried m6anet-run_inference and only found 33 m6a sites. Here is my commands:

m6anet-dataprep --eventalign eventalign.txt --out_dir m6a --n_processes 20 m6anet-run_inference --input_dir m6a --out_dir m6a --n_processes 10

Could you give me some suggestions about this issue? Thank you very much!

chrishendra93 commented 2 years ago

hi @MiphaZ ,

Can I check with you how do you threshold the m6a sites? Furthermore, can I check how many entries are there in data.readcount file?

MiphaZ commented 2 years ago

Thanks for your reply The data.readcount file includes 150 lines and here is part of it:

transcript_id,transcript_position,n_reads ENST00000210444,347,1 ENST00000210444,441,1 ENST00000210444,476,1 ENST00000210444,695,1 ENST00000210444,787,1 ENST00000210444,794,1 ENST00000210444,815,1 ENST00000210444,843,1

I don't quite understand the threshold you point out (please forgive my poor english) , but I just used the default setting in 'quick start' and the total number including low probability-modified sites is 33.

chrishendra93 commented 2 years ago

There should be more than 150 lines if you preprocess the entire dataset. Is this the eventalign result from the full HEK293T datasets? I think my data.readcount for HEK293T dataset contains 80000 ish rows with >= 20 reads

chrishendra93 commented 1 year ago

Hi YuanMei, this is a demo data, the original HEK293T dataset is hosted here https://www.ebi.ac.uk/ena/browser/view/PRJEB40872?show=reads

On Mon, Apr 18, 2022 at 5:43 PM YuanMei @.***> wrote:

Is this demo data just contains a part of HEK293T dataset?

— Reply to this email directly, view it on GitHub https://github.com/GoekeLab/m6anet/issues/35#issuecomment-1101266903, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE4RFZDDWPANAX3XMQXQI73VFUVFBANCNFSM5TVHNV7A . You are receiving this because you commented.Message ID: @.***>

kwonej0617 commented 1 year ago

Hi @chrishendra93

Is there a way to remove threshold >=20 reads or modify the threshold? Thanks!