Merck / deepbgc

BGC Detection and Classification Using Deep Learning
https://doi.org/10.1093/nar/gkz654
MIT License
123 stars 27 forks source link

Error during detector training #73

Open esraagithub opened 2 years ago

esraagithub commented 2 years ago

hello i faced a problem during detector o my sample

here is the error message, it says i didn't use a negative dataset but actually i used one that is called GeneSwap_Negatives.pfam.tsv i think deepbgc can't see the negative dataset because of a error o my command. --help didn't tell where or how to put it

  'optimizer': 'adam',
                  'shuffle': True,
                  'timesteps': 256,
                  'validation_size': 0,
                  'verbose': 1,
                  'weighted': True},
'input_params': {   'features': [   {'type': 'ProteinBorderTransformer'},
                                    {   'type': 'Pfam2VecTransformer',
                                        'vector_path': 'pfam2vec.csv'}]},
'type': 'KerasRNN'}

INFO 15/05 09:28:22 Loaded 41102 samples and 80777 domains from sample1_deepbgc_prepare_result.tsv INFO 15/05 09:28:28 Loaded 10128 samples and 706950 domains from GeneSwap_Negatives.pfam.tsv ERROR 15/05 09:28:33 Got target variable with only one value {0} in: ['sample1_deepbgc_prepare_result.tsv', 'GeneSwap_Negatives.pfam.tsv'] Traceback (most recent call last): File "/root/esraa/miniconda3/envs/deepbgcv0.1.29/lib/python3.7/site-packages/deepbgc/main.py", line 113, in main run(argv) File "/root/esraa/miniconda3/envs/deepbgcv0.1.29/lib/python3.7/site-packages/deepbgc/main.py", line 102, in run args.func.run(**args_dict) File "/root/esraa/miniconda3/envs/deepbgcv0.1.29/lib/python3.7/site-packages/deepbgc/command/train.py", line 60, in run train_samples, train_y = util.read_samples(inputs, target) File "/root/esraa/miniconda3/envs/deepbgcv0.1.29/lib/python3.7/site-packages/deepbgc/util.py", line 574, in read_samples 'Did you provide positive and negative samples?') ValueError: ("Got target variable with only one value {0} in: ['sample1_deepbgc_prepare_result.tsv', 'GeneSwap_Negatives.pfam.tsv']", 'At least two values are required to train a model. ', 'Did you provide positive and negative samples?') ERROR 15/05 09:28:33 ================================================================================ ERROR 15/05 09:28:33 DeepBGC failed with ValueError: Got target variable with only one value {0} in: ['sample1_deepbgc_prepare_result.tsv', 'GeneSwap_Negatives.pfam.tsv'] ERROR 15/05 09:28:33 ================================================================================ ERROR 15/05 09:28:33 At least two values are required to train a model. ERROR 15/05 09:28:33 Did you provide positive and negative samples? ERROR 15/05 09:28:33 ================================================================================

my cmmand:

deepbgc train --model templates/deepbgc.json --output MyDeepBGCDetector.pkl sample1_deepbgc_prepare_result.tsv GeneSwap_Negatives.pfam.tsv --config PFAM2VEC pfam2vec .csv -v ClusterFinder_Annotated_Contigs.full.gbk

prihoda commented 2 years ago

Hi @esraagithub, your file sample1_deepbgc_prepare_result.tsv contains the BGC samples, is that correct? This file will need to contain an in_cluster column, which will have a value of 1 in all rows (in case the file only contains "positive" BGC samples). Your file should also contain a sequence_id column which should contain an identifier of each BGC.

esraagithub commented 2 years ago

@prihoda Thank you for your response Yes this file sample1_deepbgc_prepare_result.tsv resulted from deepbgc prepare command. It actually contain sequence id column and in_cluster column but in_cluster column has 0 in all raws not 1 I don't know why it has only zero

prihoda commented 2 years ago

Hi @esraagithub if that file contains just BGC samples, you can manually change the value to 1 in all rows

esraagithub commented 2 years ago

Thank you I will try it

في السبت، ٢١ مايو ٢٠٢٢ ٨:٥٥ م David Příhoda @.***> كتب:

Hi @esraagithub https://github.com/esraagithub if that file contains just BGC samples, you can manually change the value to 1 in all rows

— Reply to this email directly, view it on GitHub https://github.com/Merck/deepbgc/issues/73#issuecomment-1133751829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMY4XZRJBW4FFDWMWM7M7HDVLEWRXANCNFSM5V6XF5EA . You are receiving this because you were mentioned.Message ID: @.***>

esraagithub commented 2 years ago

thank you, i tried it and it worked well but i get another error in the next step "training the classifier"

raise ValueError('No overlap found between classes and samples. Classes should be indexed by sequence_id.') ValueError: No overlap found between classes and samples. Classes should be indexed by sequence_id. ERROR 23/05 23:53:18 ================================================================================ ERROR 23/05 23:53:18 DeepBGC failed with ValueError: No overlap found between classes and samples. Classes should be indexed by sequence_id. ERROR 23/05 23:53:18 ================================================================================

I have a "sequence id" column in sample file (which i got from deepbgc prepare ) but no overlaps between it and classes file. so what should i do in this case? is this means i can't proceed with training? @prihoda