aeeckhou / shallowHRD

This method uses shallow Whole Genome Sequencing (sWGS) and the segmentation of a genomic profile to assess the Homologous Recombination Deficiency of a tumor based on the number of Large-scale Genomic Alterations (LGAs).
30 stars 13 forks source link

Filename issues and continue on error #16

Closed menzel closed 10 months ago

menzel commented 10 months ago

If you run the program using a different filename for the bam_ratio.txt file it crashes, probably due to the substitution in line 10: NAMEEE_intermediary2 = sub("*/*.bam_ratio.txt", "", NAMEEE_intermediary1) a few lines later when a temporary file is opened (here with the file /path/ID.filtered.txt):

In file(file, "rt") : cannot open file '/path/ID.filtered.txt.bam_ratio.txt': No such file or directory

This might be a minor issue, but it gets much worse with the following bit of code which I think ranks among the worst I have seen in code made for medical applications:

continue_on_error <- function() { paste() } options(error=continue_on_error)

If removed the code crashes in several places. Why not using either if-else clauses to catch issues beforehand or at least use several blocks of tryCatch. This clause makes it virtually impossible to understand any issues or trust any results as everything could have gone wrong.

For software written to be used in such an environment any warning should be regarded as an error, and any errors should always lead to a complete stop to prevent wrong results.

aeeckhou commented 10 months ago

Hello,

Regarding the 'continue_on_error' function, I agree that it is an ugly solution. I made it to avoid error messages when running on command line the software on my hospital cluster but also to solve loops that worked as intended but outputed an error message stopping the program for the last lines of a table. Indeed, tryCatch, or if-else clausses would have been better but would have taken for me a significantly longer time for me.

I wrote the script lines alone for three years from scratch for my PhD, while finishing two other big projects. The time that I had, we spent trying to improve even more the software in comparison to the first version for the publication to reach the 1.13 version and provide the best possible sensibility and specificity. We however have reviewed more than 1000 sWGSs with this version and the software is working as intended from this extensive review. A version 2.0 will be published before the end of the year, improving even more what I gave when leaving the lab 1.5 years ago. I understand the frustration regarding the poor programming but did my best with the time and the priorities that I was given and the people reviewing my results were satisfied with the outputs.

Now regarding your issue, did you manage to solve it when changing the name of the input file ? Maybe changing the name to something without a dot before ".bam_ratio.txt" will solve it. Also it's supposed to work both from relative and absolute path, but try changing between the two maybe it will help in your case.

Best, Alexandre

menzel commented 10 months ago

Thank you for your reply.

I just change the filename back to the original after my edits, this works now.

If you have an early version of the 2.0 ready I'd happy to give it a try and provide some feedback if you are interested. We are currently implementing HRD for WGS in our routine diagnostics, thus we have great interest in a good solution.

aeeckhou commented 10 months ago

Hello,

Good for the issue !

Regarding the version 2.0, I left the laboratory for a while now. What I know is that the version 2.0 is accepted in Oncogene, not yet published (by the end of the year I would say) and improves significantly my 1.13 version, notably for borderline cases and low quality cases. Regarding the software availability, the discussion between my previous lab and the Insitut Curie is still on going in collaboration with valuation and legal services. For now it is not available sorry.

Best, Alexandre