gbouras13 / dnaapler

Reorients assembled microbial sequences
MIT License
102 stars 3 forks source link

Fix two bugs: macOS MMseqs2 version and integer contig names #88

Closed rrwick closed 4 days ago

rrwick commented 4 days ago

macOS MMseqs2 version

Currently, Dnaapler checks for an MMseqs2 version string in the format major.minor. However, the macOS MMseqs2 v13.45111 binary (downloaded from MMseqs2 release page) reports its version as a hash (45111b641859ed0ddd875b94d6fd1aef1a675b7e). This causes Dnaapler's version check to fail.

Fix: Updated the version-check logic to support hash-based version numbers.

Integer contig names

When running Dnaapler on an Autocycler assembly with integer-named sequences, the following issue arose:

filtered_df = MMseqs2_df[MMseqs2_df["qseqid"] == short_contig]

In this case, short_contig was a string, but the qseqid column in the MMseqs2 results dataframe was inferred as an integer type. This type mismatch caused the filtered dataframe to always be empty.

Fix: Enforced the qseqid column to always load as a string (object type) when reading MMseqs2 results with Pandas.

rrwick commented 4 days ago

I tested the results on my Mac using an Autocycler assembly, and it all worked well! But more thorough testing may be warranted if you've got some varied test cases :smile:

codecov[bot] commented 4 days ago

Codecov Report

Attention: Patch coverage is 33.33333% with 6 lines in your changes missing coverage. Please review.

Project coverage is 85.89%. Comparing base (7ef230e) to head (c516810). Report is 50 commits behind head on main.

Files with missing lines Patch % Lines
src/dnaapler/utils/util.py 33.33% 3 Missing and 3 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #88 +/- ## ========================================== - Coverage 87.27% 85.89% -1.38% ========================================== Files 9 9 Lines 1108 1177 +69 Branches 143 154 +11 ========================================== + Hits 967 1011 +44 - Misses 103 118 +15 - Partials 38 48 +10 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.


🚨 Try these New Features:

gbouras13 commented 4 days ago

Legend thank you @rrwick ! - this would have been a pre v1.0.0 bug too