reads2bins.py – missing 'format' issue

harfeorth commented 3 years ago

Dear Anuradha, I am very interested in the approach used by metabcc-lr. I've successfully generated bins etc, but I'm not good with python and hit an issue when running bin splitting script, reads2bins.py

Traceback (most recent call last): File "/opt/miniconda/envs/pacbio_mag/MetaBCC-LR/reads2bins.py", line 29, in for record, bin_id in zip(SeqIO.parse(readsPath), open(readBinsPath)): TypeError: parse() missing 1 required positional argument: 'format'

Does 'format' here refer to the data input – Fasta or Fastq (my data is fastq), as the variable 'readsType' is defined above (line 17-20) but is not feed into the SeqIO output. Is something missing from the script??

Regards and thank you for developing, David

anuradhawick commented 3 years ago

Dear Dr Green (@harfeorth),

Thanks a lot for raising this issue. I have missed this detail and it is fixed now. I do apologise for the inconvenience.

Please feel free to get in touch if you need any assistance tuning parameters, seeing plots, etc. Few remarks;

visualised bin plots are available in images.
use --resume with varying sensitivity to change the level of bin fragmentation.
changing embedding with --embedding parameter (as tsne, umap or song) can help bin different datasets.

Kind regards Anuradha

anuradhawick commented 3 years ago

The issue has been fixed in 872dc85b36e89c910445ef34388c04c7ddcf634b

harfeorth commented 3 years ago

Dear Anuradha,

Thank you so much for the speedy changes. I've just run this and it's worked perfectly!

Yes, I do want to look at the tuning parameters: in addition to trying different embedding parameters, would you recommend testing the --k-size and --sensitivity settings as the main other parameters to change?

All the best, David

From: Anuradha Wickramarachchi @.> Sent: 03 June 2021 03:12 To: anuradhawick/MetaBCC-LR @.> Cc: David Green @.>; Mention @.> Subject: Re: [anuradhawick/MetaBCC-LR] reads2bins.py – missing 'format' issue (#11)

Dear Dr Green @.***https://github.com/harfeorth),

Thanks a lot for raising this issue. I have missed this detail and it is fixed now. I do apologise for the inconvenience.

Please feel free to get in touch if you need any assistance tuning parameters, seeing plots, etc. Few remarks;

visualised bin plots are available in images.
use --resume with varying sensitivity to change the level of bin fragmentation.
changing embedding with --embedding parameter (as tsne, umap or song) can help bin different datasets.

Kind regards Anuradha

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/anuradhawick/MetaBCC-LR/issues/11#issuecomment-853506186, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJJ6XEXSIZTMGKYCKN4Y57DTQ3QHJANCNFSM457GXL4A.

CAUTION - WEB LINKS FOUND: This email originated from outside of the organization and looks like it may contain web links. Please be careful following these links as they can open malicious websites. Be sure you know the sender and their intentions

The Scottish Association for Marine Science (SAMS) is registered in Scotland as a Company Limited by Guarantee (SC009292) and is a registered charity (9206). SAMS has two actively trading wholly owned subsidiary companies: SAMS Enterprise Ltd (SC224404) and SAMS Ltd (SC306912). All Companies in the group are registered in Scotland and share a registered office at Scottish Marine Institute, Oban Argyll PA37 1QA. The content of this message may contain personal views which are not the views of SAMS unless specifically stated. Please note that all email traffic is monitored for purposes of security and spam filtering. As such individual emails may be examined in more detail.

anuradhawick commented 3 years ago

Dear Dr Green (@harfeorth),

--k-size and --sensitivity indeed effects the results. Ideally, you should expect better results at a higher k-size (4 to 5) on more accurate reads (CCS or HiFi). But it might need a little bit more memory. The sensitivity parameter determines how well the clusters are separated. Higher sensitivity, higher fragmentation plus more bins and vice-versa.

I have just fixed a bug on --resume feature and kindly get a git pull before you proceed.

I am also planning to improve this tool with more robust clustering and will give an update here.

P.S.

Our latest work LRBinner (Accepted at WABI 2021) works out on these parameter limitations. The code is being refactored (still it should work fine) in the coming few days and will be ready for you to test if that is interesting.

Best regards, Anuradha

anuradhawick / MetaBCC-LR

reads2bins.py – missing 'format' issue #11