Something about "subset*.tfrec"

MicrobeLab / DeepMicrobes

DeepMicrobes: taxonomic classification for metagenomics with deep learning

https://doi.org/10.1093/nargab/lqaa009

Apache License 2.0

81 stars 21 forks source link

Something about "subset*.tfrec" #34

Open yongrenr opened 1 month ago

yongrenr commented 1 month ago

Hello! I think your work is great. I'd love to try running your code, but after following your steps this happens, can you tell me how I can fix it? Looking forward to your reply！

MicrobeLab commented 1 month ago

Hi, the issue seems to be due to parallel, not DeepMicrobes. Please try installing parallel first (not use parallel provided here) and make sure that installation is fine.

yongrenr commented 1 month ago

Hi, the issue seems to be due to parallel, not DeepMicrobes. Please try installing parallel first (not use parallel provided here) and make sure that installation is fine. Thank you for your promptness!I have solved this problem, the reason is linux's own problem. I have now encountered another problem. I would like to ask, what is the format and reading method of the input file? I tried to input my own FA format file, but the following problem occurred:

I tried changing label_id = int(identifier.split('|')[1]) to label_id = int(identifier.split(' ')[1]), but the above ascii problem still occurred.Looking forward to your reply!

MicrobeLab commented 1 month ago

Hi, I'm not sure how to parse the seq_id in your file. This issue is not related to DeepMicrobes codes. The index error showed that the code failed to get the second number as desired.

yongrenr commented 1 month ago

Hi, I'm not sure how to parse the seq_id in your file. This issue is not related to DeepMicrobes codes. The index error showed that the code failed to get the second number as desired.

Hello, I am very interested in your work. I have a few simple questions I'd like to ask you: 1.I want to reproduce your final classification experiment and I would like to know if the data you are using is DeepMicrobes/mag_reads_150bp_1w? How do I run them in batches?

2.If I have two files, one for label.txt and one for sequence.txt, how do I use your model for training and classification? The label.txt is the processed classification label, and the sequence.txt is the sequence in the fasta file. We look forward to hearing from you!

MicrobeLab commented 1 month ago

Hi,

Sorry, not sure what do you mean by "in batches";
I would recommend that you format your fastq headers to the same as ours.

yongrenr commented 1 month ago

Hi,

Sorry, not sure what do you mean by "in batches";

I would recommend that you format your fastq headers to the same as ours. Hello, I can't express it clearly enough. I would like to ask if the label_id of the seq, label_id = training_set_read_parser (rec) in the image is the label corresponding to the seq. If so, I think my approach should be effective. What do you think? Looking forward to hearing from you!!!

MicrobeLab commented 1 month ago

I think you can feel free to change the codes as long as no bug occurs :)