FelixBaensch / MORTAR

MOlecule fRagmenTAtion fRamework
MIT License
18 stars 3 forks source link

Fixed SD files not being imported completely #12

Closed SamuelBehr closed 8 months ago

SamuelBehr commented 1 year ago

Fixed SD files not being imported in their entire length if structures fail to be read by the reader (caused the import to stop); structures imported from SDF are now assigned the file name extended with the index of the structure in the file (formerly the count of successfull imports) if no name of the structure could be detected;

Please review!

JonasSchaub commented 1 year ago

Let me summarise the problem as I understand it: We did not configure the SDF reader to skip erroneous structures before. Which led to the whole import being aborted when an erroneous structure was encountered. Correct? Question: Why not simply configure the reader to skip at the beginning (you can actually set this in the constructor)? Because we would not be able to log it then? Would the reader just implicitly without any message directly jump to the next parseable structure?

FelixBaensch commented 1 year ago

Let me summarise the problem as I understand it: We did not configure the SDF reader to skip erroneous structures before. Which led to the whole import being aborted when an erroneous structure was encountered. Correct? Question: Why not simply configure the reader to skip at the beginning (you can actually set this in the constructor)? Because we would not be able to log it then? Would the reader just implicitly without any message directly jump to the next parseable structure?

This is exactly the problem, if we set "skip" in the Ctor, we cannot log the problematic SDF entries. For me the code is okay.

JonasSchaub commented 1 year ago

And if there are multiple erroneous entries in a row in the input file? They will be skipped together, right?

SamuelBehr commented 1 year ago

And if there are multiple erroneous entries in a row in the input file? They will be skipped together, right?

Hmmm, yes, but it will lead to the counter of the structures getting faulty...

At this point, I might be forced to extend the IteratingSDFReader with a respective counter.

JonasSchaub commented 1 year ago

Hmmm, yes, but it will lead to the counter of the structures getting faulty...

Exactly my thinking. I would not put priority on this point, rather remove the statement that the counters/indices refer to the position in the input file from the doc.

But one possible solution would be to parse the SD file yourself, put everything between two lines of "$$$" in a String/buffer, and then apply the MOL reader to that. If simply extending the IteratingSDFReader does not work. I guess this class does nothing else than what I just described.

FelixBaensch commented 1 year ago

Hmmm, yes, but it will lead to the counter of the structures getting faulty...

Exactly my thinking. I would not put priority on this point, rather remove the statement that the counters/indices refer to the position in the input file from the doc.

But one possible solution would be to parse the SD file yourself, put everything between two lines of "$$$" in a String/buffer, and then apply the MOL reader to that. If simply extending the IteratingSDFReader does not work. I guess this class does nothing else than what I just described.

Please don't, it all misses the mark by a long way. Leave the indexation of names as it is and that's it. Anything else would be too much of a good thing.

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

sonarcloud[bot] commented 8 months ago

Quality Gate Passed Quality Gate passed

Issues
7 New issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud