chrisquince / STRONG

Strain Resolution ON Graphs
MIT License
46 stars 9 forks source link

Error in flag_bad_cogs on FMTMeren #73

Closed chrisquince closed 5 years ago

chrisquince commented 5 years ago

Running the FMT data set from Meren throws an error from the flag_bad_cogs rule for Bin_134. I find this step very hard to debug and I would have said it would be a good candidate for refactoring?

The run is here:

/mnt/gpfs/Hackathon/FMTMeren/CoAssembly77_Rerun

Sebastien-Raguideau commented 5 years ago

Issue 1 : merged bins are called "binmerged" in Common_unitigs.py which make it fails to parse them on the second run. Solution : Common_unitigs.py takes the list of bins as arguments.
Issue 2 : bin
merged_3 and Bin_137 are missing from bin_init/bin_cogs_to_ignore.tsv. They possess too many shared COG, 10 of them. But not really in common to each other, so they should not be run. Corrected. All changed have been pushed and /mnt/gpfs/Hackathon/FMTMeren/CoAssembly77_Rerun is running.

chrisquince commented 5 years ago

OK so the changed script does not actually run the merged bins. I think that is because the naming convention fails to match:

def bin_paths_by_type(bintype): return [os.path.dirname(path) for path in glob.glob("subgraphs/%s/Bin*/SCG.fna" % bin_type)]

I am going to try rationalising the naming convention to see if this fixes it.

chrisquince commented 5 years ago

OK pushed changes to Common_Unitigs.py that seemed to fix this. Will close when I have tested on FMTMeren.

chrisquince commented 5 years ago

Yes this seems to work so am now closing the issue.