JACUJA2 Code Example Request

Chan-W00 commented 5 months ago

Dear JACUSA2 team,

I am a truely new to bioinformatics and have been trying to analyze C-to-T base editing that is mediated APOBEC1 enzyme directly conjugated to protein of interest (POI).

My situation is:

Bulk RNA-seq from HEK293 transiently transfected with POI-APOBEC1 or mCherry-APOBEC1 in 150 PE
Biological duplicates
Finished pre-processing, such as trimming (cutadopt), alignment (STAR), sorting (samtools), deduplication (Picard), MD tagging (samtools) and indexing (samtools)
Once tried with java -jar JACUSA_v2.0.4.jar call-2 -r result.out cond.1_rep.1.bam,cond.1_rep.2.bam cond.2_rep.1.bam,cond.2_rep.2.bam -p 4
I got following result (Several first strigs):
contig start end name score strand bases11 bases12 bases21 bases22 info filter ref

chr1 14521 14522 call-2 0.0639599068288419 . 1,0,15,0 0,0,12,0 1,0,13,0 1,0,21,0 G chr1 14522 14523 call-2 0.46276221056984923 . 0,16,0,0 0,12,0,0 0,15,0,0 0,21,0,1 C chr1 14541 14542 call-2 0.12227137530251753 . 17,0,2,0 13,0,1,0 11,0,4,0 23,0,1,0 A chr1 14573 14574 call-2 0.20814993206096233 . 24,0,8,0 22,0,1,0 26,0,5,0 25,0,2,0 A

My questions are:

Is my code correct? Otherwise, please make recommanded code for me
What does basesN mean? (bases12-bases22)
What does ref mean, while I didn't put reference genome (.fa) file to the code?

Please.. Give me any answer, otherwise my professor will kill me 😹.

piechottam commented 5 months ago

This depends on you - you could add filtering " -a D ". If you know your library type - you could add it to your cal "-P ...".
basesN (you can find details in the manual). -> base12 -> condition 1 and replicate 2 (speaking in filenames: cond.1_rep.2.bam). The ","-vector corresponds to the observed base call count A,C,G,T
ref is the reference base. The MD field from the BAM file (check SAM file format) is used to deduce the reference base.

Chan-W00 commented 5 months ago

What a fast... thanks for reply.

So, according to your answer, "chr1 14521 14522 call-2 0.0639599068288419 . 1,0,15,0 0,0,12,0 1,0,13,0 1,0,21,0 G" means that in chromosome number 1 loci 14521-2, condition_1 rep_1 has 1,0,15,0 base conversion from G to A,C,G,T, respectively.

I got it and this is reason why the third position (G) is higher than others.

I really want to caculate total editing number per gene but have no idea about it, yet. Is there any recommandation for me to do this?

piechottam commented 2 months ago

Sry, for the late response.

You can use bedtools to add the gene name or some other ID (as new column based on coordinate overlap) to the JACUAS2 output. In R you could then aggregate over the freshly added column,

dieterich-lab / JACUSA2

JACUJA2 Code Example Request #73

contig start end name score strand bases11 bases12 bases21 bases22 info filter ref