Oshlack / JAFFA

JAFFA is a multi-step pipeline that takes either raw RNA-Seq reads, or pre-assembled transcripts, then searches for gene fusions
https://github.com/Oshlack/JAFFA/wiki
Other
84 stars 21 forks source link

mutated cells detection #104

Closed mms100 closed 1 week ago

mms100 commented 1 month ago

Thanks alot for the apprecitated tool

I have applied JAFFAL for a single cell long read nanopre data and it works nicely in both the healthy and mutation samples.

What I want to do now is to detect the mutated cells in the mutated samples, so is there a way to capture those cells without appyling FLAMES.

BTW, I have already applied this workflow: https://github.com/epi2me-labs/wf-single-cell

in this tool they have mentioned the have done so many improvment to increase the accuracy of barcode matching, that's why I want to make use of it.

Many thnx, Mohamed.

nadiadavidson commented 1 month ago

Hi Mohamed,

I'm glad to hear that JAFFAL is working as expected on the psuedo-bulk samples.

JAFFAL has a very simple script you can use to pull out the cell barcodes corresponding to each fusion, it's located at /scripts/get_cell_barcodes_by_fusion.bash in the JAFFA installation directory with usage as shown below.

USAGE: ./get_cell_barcodes_by_fusion.bash <list of fusions> <sample/sample.txt>

Where <list of fusions> is a list of fusions with colon separated gene names. e.g. BCR:ABL1
and <sample> is a long read single cell sample which has been processed by
FLAMES method, match_cell_barcode, as well as JAFFAL

This expects the read IDs to be in the format: \<barcode>_\<UMI>... e.g. >TTGCCGTTCGCGCCAA_GTGCTGTATT#SRR12282458.181421 You can check by looking at the reads in jaffa_results.fasta to see if wf-single-cell has produced reads with this format.

If not, I would suggest trying Flexiplex which we wrote for long read single cell barcode demultiplexing. It's very accurate and fast to run, and will produce reads compatible with the script above. If you use this, you will need to rerun JAFFAL on the demultiplex data before running the script above.

Good luck and please feel free to share if the solution above worked.

Cheers, Nadia.

mms100 commented 1 month ago

Hi Nadia,

Thank you so much for the detailed explanition.

I want to know your feedback about the following approche:-

a- from sample_1.fastq.txt file, one of JAFFAL output I will 1- filter for the translocation of interest 2- extract the read_ids from the first column

example:

image

b- from read_tags.txt, one of the epi2me-labs/wf-single-cell workflow files that contains corrected cell barcodes and crossponding read_id 1- I will extract the read_ids that is common with with read_ids in step 2a 2- extract the corrected_cell_barcodes, ==> barcodes of cells having the translocation interest

example:

image

I have thought about this approche after reviewing this issue, where epi2me developer refered to this read_tags file that have both read_ids and the corrected barcodes https://github.com/epi2me-labs/wf-single-cell/issues/60

Lastly and I am sorry for all the long question:

is there a way to adjust the threshold to make JAFFAL detect more reads that have translocation, like making JAFFAL recored the translocation even it is only detected in one read? I just want to play around with the confidency level to see if I will detect more reads?

please let me know your valuable input on that

many thanks in advance, Mohamed.

nadiadavidson commented 1 month ago

Hi Mohamed,

To answer the first part of the question. Yes, I think this approach will work.

For the second part of the question... for the file that contains the read IDs and fusions, e.g. sample_1.fastq.txt you can't really adjust anything to get this to report more reads, it should already be reporting almost all recoverable reads which look chimeric. It's only at the stage of the final result table, jaffa_results.csv, that filtering and a confidence classification is applied. Adjusting the options for that won't change the intermediate file (e.g sample_1.fastq.txt) or how many cell barcodes you get. Hopefully I've understood your question correctly and if not you are welcome to ask for more clarification.

Cheers, Nadia.

mms100 commented 1 month ago

Hi Nadia,

Thanks again for your clarification.

when I was asking to retrieve more chimeric reads, I thought that JAFFAL has a default threshold that is applied on the intermediate files, as the file that I am currently care about (sample_1.fastq.txt).

I have this concern bec. when I applied samtools on the bam file out put from the nanopore workflow to extract the supplemtary reads between the two fusion genes it actually showed more reads and that's why I thought that I their is a threshold in JAFFAL even in the intermediate files.

but I think now I got it, or? =D

Many thnx, Mohamed.

nadiadavidson commented 1 month ago

Hi Mohamed,

That's interesting that JAFFAL appears to miss fusion reads based on the bam results. I wouldn't have expected this, but there are some types of rare breakpoint which are missed even in the .txt file. These include things like breakpoint with retained intronic sequence, if the two genes are fused in an anti-sense direction from each other, fusions to a promoter.

If you are interested in another method to check, and you know the sequence up and down stream of your breakpoint you could try Flexiplex (which I mentioned earlier). It can be used for barcode demultiplexing, but also as a noise tolerance search tool, a bit like grep. We used it to find fusions in the short-read single-cell data, as an example in our paper https://academic.oup.com/bioinformatics/article/40/3/btae102/7611801 and the process would be very similar for long read data. It might have higher sensitivity to detect fusion reads than either JAFFAL and samtools.

Good luck with the rest of your analysis!

Cheers, Nadia.

mms100 commented 1 month ago

Aaah okay, I am talking about BCR::ABL1 tanslocation BTW.

Many thanks for your appreciated input and I will try out Flexiplex for sure.

mms100 commented 3 weeks ago

Hi Nadia,

I was trying to figure out a steps to apply flexiplex and JAFFAL, but if I only know that my library should have one of the following translocation let's say b2a2 :

image

how can I use the demultiplexing and genotyping of the flexiplex I think I will need to apply this :

flexiplex -d -k barcode_list.txt reads.fasta | flexiplex -n barcode_mutation_mapping -i false -x "GTATCGTCAAGGCACTCTTGCCTACGC" -k "CACTAGC,CACCAGC" -b "CAC?AGC" -x "TCCAACTACCACAAGTTTATATTCAGT" -e 0 -f 15 -u "" reads.fasta > kras_var_reads_with_barcodes.fasta

but I don't know how to replace this argument -k "CACTAGC,CACCAGC" to fit the translocation of interest "e1a2" shall I put the whole sequence of exon13 of BCR then comma then whole sequence of exon2.

Then from what I understand I should rerun JAFFAL but know using the output of the flexiplex.

is that correct?

thanks for your continous support, Mohamed.

nadiadavidson commented 3 weeks ago

Hi Mohamed,

The simple way to do this would be to alter the command like so:

flexiplex -d -k barcode_list.txt reads.fasta > reads_with_barcodes.fasta 
flexiplex -x $SEQ -d grep -f 2 > fusion_reads_with_barcodes.fasta

Notes:

There's no need to run JAFFAL if you know the fusion sequences and run flexiplex like above. You will already have all the fusion reads and cell IDs.

Hope this is helpful and please feel free to ask further questions.

Cheers, Nadia.

mms100 commented 1 week ago

Dear Nadia,

Many thanks again for your valuable input.

I have applied flexiplex, and reached to the same result from JAFFAL.

please feel free to close this issue.

Best, Mohamed.

nadiadavidson commented 1 week ago

Thanks for the update. I will close the issue now.

Cheers, Nadia.