Closed gsukrit closed 4 months ago
Hi,
Thanks for the email.
The position is the coordinate (half-open) of the 9-mer in the transcript where the prediction is made.
That is, if the number is NNNNANNNN and the location is
5' XXXNNNNANNNNXXX... 3'
The coordinate should be 2, i.e.:
0123456... 5' XXXNNNNANNNNXXX... 3'
I cc Akanksha and Favour who will be able to confirm
Thanks
Eduardo
On Tue, 1 Aug 2023 at 00:39, gsukrit @.***> wrote:
Hi team,
I would like to have clarity on the column header position in the output file of site_level_prediction (CHEUI solo outputs). Does it indicate the position of modified base from the start of that particular transcript ID or the coordinate on the genome where that modified base is predicted. If it is the position of the base from the start of that particular transcript, does it start from the mRNA coordinates (starting from the 5' UTR region) on the genome ?
Any idea on this concern further will be highly appreciated.
Thanks,
Sukriti
— Reply to this email directly, view it on GitHub https://github.com/comprna/CHEUI/issues/24, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKBZS4VL3WAAFGAIPKM3XS67S5ANCNFSM6AAAAAA26LTAH4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thank you for your response. Can you please suggest a possible way to map this position to the exact coordinate on the genome from the gff file, i.e. to 5'UTR/CDS/3'UTR ? Or to map it to the exact nucleotide base of that particular transcript ?
Thanks,
Sukriti
Hi Sukriti,
The position is the position of the first nucleotide in the site column. So for example below
contig position site coverage stoichiometry probability
ENST00000000233.10 1003 CTTGAGTAA 648 0.10132158 0.11857438
1003 is the position of C for the site CTTGAGTAA in the transcript ENST00000000233.10. To get the position of the center nucleotide which in this case is A you add 5. So it will be position +5 for the center nucleotide for which the prediction is made. I hope it helps.
Thanks, Akanksha
Thank you for the information. That really gave some clarity. Can you suggest a possible approach / method to map these predicted positions to the exact gene coordinate and classify them as UTR / CDS.
Thanks for the assistance.
Regards,
Sukriti
On Wed, Aug 2, 2023, 4:52 AM Akanksha2511 @.***> wrote:
Hi Sukriti,
The position is the position of the first nucleotide in the site column. So for example below
contig position site coverage stoichiometry probability ENST00000000233.10 1003 CTTGAGTAA 648 0.10132158 0.11857438
1003 is the position of C for the site CTTGAGTAA in the transcript ENST00000000233.10. To get the position of the center nucleotide which in this case is A you add 5. So it will be position +5 for the center nucleotide for which the prediction is made. I hope it helps.
Thanks, Akanksha
— Reply to this email directly, view it on GitHub https://github.com/comprna/CHEUI/issues/24#issuecomment-1661233280, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATOY3FGJTQUDCYN2INLD2OLXTGFUJANCNFSM6AAAAAA26LTAH4 . You are receiving this because you authored the thread.Message ID: @.***>
-- The information contained in this electronic communication is intended solely for the individual(s) or entity to which it is addressed. It may contain proprietary, confidential and/or legally privileged information. Any review, retransmission, dissemination, printing, copying or other use of, or taking any action in reliance on the contents of this information by person(s) or entities other than the intended recipient is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us by responding to this email or telephone and immediately and permanently delete all copies of this message and any attachments from your system(s). The contents of this message do not necessarily represent the views or policies of BITS Pilani.
Hi, yes,
Please have a look at our tool https://github.com/comprna/R2Dtool Here is the preprint https://www.biorxiv.org/content/10.1101/2022.09.23.509222v1
I cc AJ who wrote the software Please let us know if you have any problem with it best
Eduardo
On Wed, 2 Aug 2023 at 14:00, gsukrit @.***> wrote:
Thank you for the information. That really gave some clarity. Can you suggest a possible approach / method to map these predicted positions to the exact gene coordinate and classify them as UTR / CDS.
Thanks for the assistance.
Regards,
Sukriti
On Wed, Aug 2, 2023, 4:52 AM Akanksha2511 @.***> wrote:
Hi Sukriti,
The position is the position of the first nucleotide in the site column. So for example below
contig position site coverage stoichiometry probability ENST00000000233.10 1003 CTTGAGTAA 648 0.10132158 0.11857438
1003 is the position of C for the site CTTGAGTAA in the transcript ENST00000000233.10. To get the position of the center nucleotide which in this case is A you add 5. So it will be position +5 for the center nucleotide for which the prediction is made. I hope it helps.
Thanks, Akanksha
— Reply to this email directly, view it on GitHub https://github.com/comprna/CHEUI/issues/24#issuecomment-1661233280, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ATOY3FGJTQUDCYN2INLD2OLXTGFUJANCNFSM6AAAAAA26LTAH4>
. You are receiving this because you authored the thread.Message ID: @.***>
-- The information contained in this electronic communication is intended solely for the individual(s) or entity to which it is addressed. It may contain proprietary, confidential and/or legally privileged information. Any review, retransmission, dissemination, printing, copying or other use of, or taking any action in reliance on the contents of this information by person(s) or entities other than the intended recipient is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us by responding to this email or telephone and immediately and permanently delete all copies of this message and any attachments from your system(s). The contents of this message do not necessarily represent the views or policies of BITS Pilani.
— Reply to this email directly, view it on GitHub https://github.com/comprna/CHEUI/issues/24#issuecomment-1661454631, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB2Z55C5YVBU5GQMDMDXTHGHHANCNFSM6AAAAAA26LTAH4 . You are receiving this because you commented.Message ID: @.***>
Hi team,
So I tried running the suggested R2Dtool. The CHEUI output file (site_level_5mC_prediction) looked like this:
The command bash cheui_to_bed.sh [cheui model II output file] [cheui_to_bed output file]
created a file that looked like this:
For some reason it hasn;t copied the complete gene accession IDs. Probably the reason why Rscript ./scripts/R2_annotate.R ./test/out_CHEUI_modelII.bed ./test/GRCm39_subset.gtf ./test/out_CHEUI_modelII_annotated.bed
gave the following error:
Please let me know what went wrong and how to proceed further.
Thank you for your response and efforts,
Sukriti
Hi Sukriti, apologies for the delay.
You can try the latest version of R2Dtool, which should resolve this error:
Hi team,
I would like to have clarity on the column header position in the output file of site_level_prediction (CHEUI solo outputs). Does it indicate the position of modified base from the start of that particular transcript ID or the coordinate on the genome where that modified base is predicted. If it is the position of the base from the start of that particular transcript, does it start from the mRNA coordinates (starting from the 5' UTR region) on the genome ?
Any idea on this concern further will be highly appreciated.
Thanks,
Sukriti