AlexandrovLab / SigProfilerTopography

SigProfilerTopography allows evaluating the effect of chromatin organization, histone modifications, transcription factor binding, DNA replication, and DNA transcription on the activities of different mutational processes. SigProfilerTopography elucidates the unique topographical characteristics of mutational signatures.
BSD 2-Clause "Simplified" License
19 stars 1 forks source link

Strand bias - mutation-level information? #8

Closed maia-munteanu closed 3 months ago

maia-munteanu commented 6 months ago

Hi! Thank you for developing this nice tool. I was wondering, is there an option to get mutation-level information about strand bias? i.e. for each mutation in each sample, is it leading/lagging or transcribed/untranscribed? I tried the sample_based=True argument but I get this error: TypeError: runAnalyses() got an unexpected keyword argument 'sample_based'

Many thanks, Maia

burcakotlu commented 6 months ago

Dear Maia,

Thank you very much for using SigProfilerTopography.

if you run SPT by setting "delete_unnecessary_files = False" in the "runAnalyses" method, then under your path/to/outputDir/jobname/data, you will find chrome-based annotated text files. In these text files, under the "TranscriptionStrand" column, you can see the transcription strand of each mutation.

The "ReplicationStrand" column is computed for each mutation but not written to these chrome-based annotated text files under path/to/outputDir/jobname/data.

But in fact, it is doable and can be implemented. I can inform you when it is committed.

"sample-based" was for providing results for each sample. Due to the huge number of resulting figures, it is not maintained at the moment.

If you have any further questions, please let me know.

Best, Burcak

maia-munteanu commented 6 months ago

Dear Burcak,

Thank you for getting back to me so quickly! I'd really appreciate a message when the replication strand code is committed. About the option delete_unnecessary_files = False: it still gives me a similar error as above, perhaps I'm missing something:

topography.runAnalyses(genome, 
                   inputDir, 
                   outputDir, 
                   jobname, 
                   numofSimulations, 
                   epigenomics=False,
                   nucleosome=False, 
                   replication_time=False, 
                   strand_bias=True, 
                   processivity=False,
                   delete_unnecessary_files=False)

TypeError: runAnalyses() got an unexpected keyword argument 'delete_unnecessary_files'

Many thanks, Maia

burcakotlu commented 6 months ago

Dear Maia,

Please update SPT to ver 1.0.88. Also, update your code as follows:

topography.runAnalyses(genome, inputDir, outputDir, jobname, numofSimulations, epigenomics=False, nucleosome=False, replication_time=False, strand_bias=True, processivity=False, mutation_types = ['SBS', 'DBS', 'ID'], delete_unnecessary_files=False)

Provide mutation_types as follows: If your input has single base substitutions, include 'SBS' in mutation_types; double base substitutions, include 'DBS' in mutation_types; and small insertions and deletions (include 'ID' in mutation_types).

If you have any problems, please let me know.

Best wishes, Burcak

maia-munteanu commented 6 months ago

Thank you, that seems to have done the trick!

Maia

burcakotlu commented 3 months ago

Dear Maia,

I remember your email requesting mutation-level information about strand bias for each mutation in each sample.

Now it is implemented. You can download and check SPT version 1.0.92.

Chrom-based files under .../output_dir/jobname/data/chrbased/ contain replication and transcription strand information for each mutation.

Replication Strand A: Lagging E: Leading U: Unknown B: Bidirectional. Both Lagging and Leading (can happen for long indels)

Transcription Strand T: Transcribed 
U: Untranscribed 
 B: Bidirectional 
 N: Nontranscribed 


Also, there are further text and Excel files under .../output_dir/jobname/data/replication_strand_bias/ summarizing replication strand asymmetries.

Likewise, text and Excel files under .../output_dir/jobname/data/transcription_strand_bias/ summarize transcription strand asymmetries.

If you have any suggestions/questions, please let me know.

Best wishes, Burcak

maia-munteanu commented 3 months ago

Dear Burcak,

Thank you so much for letting me know, I'll try to run it today!

Many thanks, Maia

burcakotlu commented 3 months ago

Dear Maia,

Please try it. If you have any suggestions or encounter any problems please let me know.

Best wishes, Burcak