AlexandrovLab / SigProfilerAssignmentR

R wrapper for utilizing the SigProfilerAssignment framework
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

Ordering of rows critical #1

Closed ab08028 closed 7 months ago

ab08028 commented 1 year ago

Hi there, thanks for such a great set of tools!

I wanted to report a possible bug or at least a bit of dangerous non-error throwing:

I created my own input maand made the MutationTypes column in the format A[C>G]A, etc.

SigProfilerAssignmentR's cosmic_fit() ran on this input matrix with no errors.

However, the signatures it provided were bizarre -- SBS1 didn't have spikes concentrated at CpG>TpG 3mers! The activities were similarly strange and involved many more COSMIC signatures than I'd expect.

I tracked the issue down to my MutationTypes column being in a different row-order than the test data matrix.

This seems like a sneaky bug that could cause issues downstream for folks, or at least needs some big warning documentation somewhere, since the program ran 'correctly' and gave me activities and signature identities, and it was only because I know what SBS1 is supposed to look like that my suspicious were raised.

To fix it, I had to read in the test data and reorder my dataframe to match the order of rows in the test data. The results now make much more sense and SBS1 looks as it should:

inputMatrix <- inputMatrix[match(testData$MutationType, inputMatrix$MutationType),]

mdbarnesUCSD commented 1 year ago

Hi @ab08028,

Thanks for reaching out. We will be looking to implement a way to better handle input files in a future release.

mdbarnesUCSD commented 7 months ago

The SigProfilerAssignment v0.1.2 update includes sorting now, which will sort users input by the contexts. Please update and reach out if you encounter any issues. Thanks!