PengNi / deepsignal2

GNU General Public License v3.0
27 stars 4 forks source link

extract_feature.py`s option parameter #10

Open YuYangmio opened 2 years ago

YuYangmio commented 2 years ago

Dear Peng, I want to try to train your model, but in the extract -- feature file, I find the option -- Positions. I wonder if this is the minimap2 file as described in the article?I find that the default is None. Does the inclusion of location information have any effect on the training of the model? Best YU

PengNi commented 2 years ago

--positions specifies high-confidence sites (e.g., sites have 0 or 1 methylation frequencies in WGBS) when you want to extract training samples. You can also extract all samples (without setting --positions) first, then generate the samples for training from all samples using your own scripts.

Best, Peng

YuYangmio commented 2 years ago

Dear Peng, When I tried to train your model with the EScherichia coli reference data set of your paper, I found the following errors in using Guppy for r9 model data " Fast5 read file is invalid. Raw data field 'median_before' has wrong type."Have you ever encountered this problem?May I ask what the solution is? Best, Yu

PengNi commented 2 years ago

Hi @YuYangmio , I am not sure what exactly the issue is. May be the R9 data is too old for Guppy to process. R9 pore reads may have been deprecated by ONT. I suggest you using some new data (like R9.4.1/R10.3) for your test.

Best, Peng

YuYangmio commented 2 years ago

Dear Peng, When I tried to train your model again ,i found that "the result of deepsignal2 extract is strange because of the methy_label param is only 1 not 0. Usually, two-category`s data is 0/1.and I used the tsv data to train the model,i found that ACC=1.0 LOSS=0. deepsignal2 extract -i ../Notts/FAF15665-16056159 -o human.fast5s.CG.fea tures.tsv --corrected_group RawGenomeCorrected_000 --nproc 30 --motifs CG

PengNi commented 2 years ago

Dear Peng, When I tried to train your model again ,i found that "the result of deepsignal2 extract is strange because of the methy_label param is only 1 not 0. Usually, two-category`s data is 0/1.and I used the tsv data to train the model,i found that ACC=1.0 LOSS=0. deepsignal2 extract -i ../Notts/FAF15665-16056159 -o human.fast5s.CG.fea tures.tsv --corrected_group RawGenomeCorrected_000 --nproc 30 --motifs CG

when you want to extract negative labels, --methy_label should be set to 0.

Best, Peng

PengNi commented 2 years ago

Maybe you can check issue #7 for more information.