Closed rvicedomini closed 3 months ago
I've did some modifications related to the -s
option. More precisely, I replaced the -s
option with the -i
option which accepts a file path to an existing matrix (before, -s
assumed the matrix had a fixed name within the output folder).
Now you can use a previously computed input matrix and make the pipeline work in a different output directory (if desired).
One thing I'd like to do is to avoid requiring the positional argument (fof file) when the input matrix is provided through -i
, as it is only required by kmtricks (which is skipped).
I just committed an update that ignores the positional argument if present when -i
option is provided.
@CamilaDuitama & @frankandreace, I was looking at the code and noticed that the last part of the pipeline is quite inefficient for two main reasons. First, sorting does not seem to be needed (also considering that ggcat output is not deterministic). Second, for each unitig ID, a linear scan of the unitig FASTA is done to find its sequence (which makes this step quadratic in the number of unitigs).
If all we want is to have, as final output, a unitig matrix containing the actual sequence in the first column (instead of the ID), I only have to change one line of code for the command kmat_tools unitig
(thus skipping the aforementioned part). I could also add a flag if you want the user to decide whether to have the ID or the sequence in the first column.
Yes please, if it's more easily done from the kmat_tools unitig code then it's much better. Thanks @rvicedomini
Flag -s
has been added to both muset
and kmat_tools unitig
commands in order to write unitig sequence in the output matrix instead of the ID (default behavior when the flag is not used).
I also made a small change to the main script. More precisely, I now update the PATH environment variable at the beginning using the directory of the script. In this way, the kmat_tools
executable which is called should be the one within the same folder of muset
.
Hi @CamilaDuitama & @frankandreace ,
I want to mention that I did some important changes on the repository.
conda/muset
(it is not needed to create a full copy of the sources to build a conda package!). To build the package runconda-build conda/muset
from the root of the project. @CamilaDuitama you should probably delete the "muset" conda packages you built previously and rebuild one from scratch.environment.yaml
test/
path prefix from thefof.txt
file. I think muset should be run from the basetest
directory to avoid "polluting" the main repository directory. I don't know... we might even consider to rename itexample
... not a pressing matter, though.I don't know if there is more to add... if there is, I'll add it later...