comprna / CHEUI

Concurrent identification of m6A and m5C modifications in individual molecules from nanopore sequencing
Other
33 stars 2 forks source link

Column header _position_ in site_level_prediction #24

Closed gsukrit closed 4 months ago

gsukrit commented 1 year ago

Hi team,

I would like to have clarity on the column header position in the output file of site_level_prediction (CHEUI solo outputs). Does it indicate the position of modified base from the start of that particular transcript ID or the coordinate on the genome where that modified base is predicted. If it is the position of the base from the start of that particular transcript, does it start from the mRNA coordinates (starting from the 5' UTR region) on the genome ?

Any idea on this concern further will be highly appreciated.

Thanks,

Sukriti

EduEyras commented 1 year ago

Hi,

Thanks for the email.

The position is the coordinate (half-open) of the 9-mer in the transcript where the prediction is made.

That is, if the number is NNNNANNNN and the location is

5' XXXNNNNANNNNXXX... 3'

The coordinate should be 2, i.e.:

0123456... 5' XXXNNNNANNNNXXX... 3'

I cc Akanksha and Favour who will be able to confirm

Thanks

Eduardo

On Tue, 1 Aug 2023 at 00:39, gsukrit @.***> wrote:

Hi team,

I would like to have clarity on the column header position in the output file of site_level_prediction (CHEUI solo outputs). Does it indicate the position of modified base from the start of that particular transcript ID or the coordinate on the genome where that modified base is predicted. If it is the position of the base from the start of that particular transcript, does it start from the mRNA coordinates (starting from the 5' UTR region) on the genome ?

Any idea on this concern further will be highly appreciated.

Thanks,

Sukriti

— Reply to this email directly, view it on GitHub https://github.com/comprna/CHEUI/issues/24, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKBZS4VL3WAAFGAIPKM3XS67S5ANCNFSM6AAAAAA26LTAH4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

gsukrit commented 1 year ago

Thank you for your response. Can you please suggest a possible way to map this position to the exact coordinate on the genome from the gff file, i.e. to 5'UTR/CDS/3'UTR ? Or to map it to the exact nucleotide base of that particular transcript ?

Thanks,

Sukriti

Akanksha2511 commented 1 year ago

Hi Sukriti,

The position is the position of the first nucleotide in the site column. So for example below

contig              position    site    coverage        stoichiometry   probability
ENST00000000233.10 1003     CTTGAGTAA   648              0.10132158     0.11857438

1003 is the position of C for the site CTTGAGTAA in the transcript ENST00000000233.10. To get the position of the center nucleotide which in this case is A you add 5. So it will be position +5 for the center nucleotide for which the prediction is made. I hope it helps.

Thanks, Akanksha

gsukrit commented 1 year ago

Thank you for the information. That really gave some clarity. Can you suggest a possible approach / method to map these predicted positions to the exact gene coordinate and classify them as UTR / CDS.

Thanks for the assistance.

Regards,

Sukriti

On Wed, Aug 2, 2023, 4:52 AM Akanksha2511 @.***> wrote:

Hi Sukriti,

The position is the position of the first nucleotide in the site column. So for example below

contig position site coverage stoichiometry probability ENST00000000233.10 1003 CTTGAGTAA 648 0.10132158 0.11857438

1003 is the position of C for the site CTTGAGTAA in the transcript ENST00000000233.10. To get the position of the center nucleotide which in this case is A you add 5. So it will be position +5 for the center nucleotide for which the prediction is made. I hope it helps.

Thanks, Akanksha

— Reply to this email directly, view it on GitHub https://github.com/comprna/CHEUI/issues/24#issuecomment-1661233280, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATOY3FGJTQUDCYN2INLD2OLXTGFUJANCNFSM6AAAAAA26LTAH4 . You are receiving this because you authored the thread.Message ID: @.***>

-- The information contained in this electronic communication is intended solely for the individual(s) or entity to which it is addressed. It may contain proprietary, confidential and/or legally privileged information. Any review, retransmission, dissemination, printing, copying or other use of, or taking any action in reliance on the contents of this information by person(s) or entities other than the intended recipient is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us by responding to this email or telephone and immediately and permanently delete all copies of this message and any attachments from your system(s). The contents of this message do not necessarily represent the views or policies of BITS Pilani.

EduEyras commented 1 year ago

Hi, yes,

Please have a look at our tool https://github.com/comprna/R2Dtool Here is the preprint https://www.biorxiv.org/content/10.1101/2022.09.23.509222v1

I cc AJ who wrote the software Please let us know if you have any problem with it best

Eduardo

On Wed, 2 Aug 2023 at 14:00, gsukrit @.***> wrote:

Thank you for the information. That really gave some clarity. Can you suggest a possible approach / method to map these predicted positions to the exact gene coordinate and classify them as UTR / CDS.

Thanks for the assistance.

Regards,

Sukriti

On Wed, Aug 2, 2023, 4:52 AM Akanksha2511 @.***> wrote:

Hi Sukriti,

The position is the position of the first nucleotide in the site column. So for example below

contig position site coverage stoichiometry probability ENST00000000233.10 1003 CTTGAGTAA 648 0.10132158 0.11857438

1003 is the position of C for the site CTTGAGTAA in the transcript ENST00000000233.10. To get the position of the center nucleotide which in this case is A you add 5. So it will be position +5 for the center nucleotide for which the prediction is made. I hope it helps.

Thanks, Akanksha

— Reply to this email directly, view it on GitHub https://github.com/comprna/CHEUI/issues/24#issuecomment-1661233280, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ATOY3FGJTQUDCYN2INLD2OLXTGFUJANCNFSM6AAAAAA26LTAH4>

. You are receiving this because you authored the thread.Message ID: @.***>

-- The information contained in this electronic communication is intended solely for the individual(s) or entity to which it is addressed. It may contain proprietary, confidential and/or legally privileged information. Any review, retransmission, dissemination, printing, copying or other use of, or taking any action in reliance on the contents of this information by person(s) or entities other than the intended recipient is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us by responding to this email or telephone and immediately and permanently delete all copies of this message and any attachments from your system(s). The contents of this message do not necessarily represent the views or policies of BITS Pilani.

— Reply to this email directly, view it on GitHub https://github.com/comprna/CHEUI/issues/24#issuecomment-1661454631, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB2Z55C5YVBU5GQMDMDXTHGHHANCNFSM6AAAAAA26LTAH4 . You are receiving this because you commented.Message ID: @.***>

gsukrit commented 1 year ago

Hi team,

So I tried running the suggested R2Dtool. The CHEUI output file (site_level_5mC_prediction) looked like this:

image

The command bash cheui_to_bed.sh [cheui model II output file] [cheui_to_bed output file] created a file that looked like this:

image

For some reason it hasn;t copied the complete gene accession IDs. Probably the reason why Rscript ./scripts/R2_annotate.R ./test/out_CHEUI_modelII.bed ./test/GRCm39_subset.gtf ./test/out_CHEUI_modelII_annotated.bed gave the following error:

image

Please let me know what went wrong and how to proceed further.

Thank you for your response and efforts,

Sukriti

pre-mRNA commented 4 months ago

Hi Sukriti, apologies for the delay.

You can try the latest version of R2Dtool, which should resolve this error:

https://github.com/comprna/R2Dtool