bcgsc / physlr

:chains: Construct a Physical Map from Linked Reads
GNU General Public License v3.0
18 stars 8 forks source link

Iterative mol sep 2 #148

Closed aafshinfard closed 2 years ago

aafshinfard commented 4 years ago

Runs molecule separation iteratively without writing to file until the end -> faster iteration. There is a default mol_strategy called iterative that runs 3 rounds being distributed+sqcosbin, sqcosbin, and then sqcosbin sequentially and with different settings (see set_settings())

lcoombe commented 4 years ago

Looks like pylint still isn't happy -- Do you want to ping Johnathan and I when CI is passing so we can review?

aafshinfard commented 4 years ago

@lcoombe @jowong4 I'm ready for reviews. Thanks. Since the results of distributed+sqcosbin++sqcosbin++sqcosbin are not compared to the current default, I did not set it to be the default strategy, and we still run distributed+sqcosbin as default but since the code is deeply reformatted, we can run distributed+sqcosbin again with this branch to check if the results are identical to Master.

aafshinfard commented 4 years ago

@jowong4 @lcoombe I just added a new function to reformat the molecule ids from a messy form like ACGT_0_0_1_0_2 to a single number. Iterative molecule separation runs as is and at the end it calls the new function to reformat the molecule ids. Will know address failed checks...

aafshinfard commented 4 years ago

and now all checks have passed!

aafshinfard commented 4 years ago

when I use cosine-similairy (even using the current master branch), sometimes I experience an issue with the python multiprocesssing:

/projects/btl/aafshinfard/virtuEnv/pypy3/lib-python/3/multiprocessing/semaphore_tracker.py:129: UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown
  len(cache))

Do you know if it may affect the results? like not processing some of the vertices?

aafshinfard commented 4 years ago

In a single test-run, I found an unexpected error:

8289 Identified molecules
Traceback (most recent call last):
  File "/projects/btl_scratch/aafshinfard/projects/physlr/iterative_mol_sep_2/physlr/bin/physlr", line 19, in <module>
    physlr.physlr.main()
  File "/projects/btl_scratch/aafshinfard/projects/physlr/iterative_mol_sep_2/physlr/physlr/physlr.py", line 3043, in main
    Physlr().main()
  File "/projects/btl_scratch/aafshinfard/projects/physlr/iterative_mol_sep_2/physlr/physlr/physlr.py", line 3039, in main
    getattr(Physlr, method_name)(self)
  File "/projects/btl_scratch/aafshinfard/projects/physlr/iterative_mol_sep_2/physlr/physlr/physlr.py", line 1914, in physlr_molecules
    m = gin.nodes[u]["m"]
KeyError: 'm'
make: *** [hg004.k40-w32.n100-5000.c2-100.physlr.overlap.m92.5.iterative.mol.tsv] Error 1
make: *** Deleting file `hg004.k40-w32.n100-5000.c2-100.physlr.overlap.m92.5.iterative.mol.tsv'

more precisely:

m = gin.nodes[u]["m"]
KeyError: 'm'

This happened in the 3rd round of molecule separation, that means the algorithm has passed through the first and second rounds successfully without this error! and in each round I do add feature m to any vertex I add to the graph!