PacificBiosciences / ANGEL

Robust Open Reading Frame prediction (ANGLE re-implementation)
Other
16 stars 14 forks source link

Unable to get classifier.pickle result #30

Closed chuanyingliu closed 4 years ago

chuanyingliu commented 5 years ago

Hi, when I run angle_train.py , I can not get the pickle file, and The program has been running for 10 days ,Execution record file is as follows image

The script is like below:

export PATH="/NJPROJ1/PB/personal_dir/liuchuanying/software/Miniconda2/bin:$PATH"
source activate /NJPROJ1/PB/personal_dir/liuchuanying/software/Miniconda2/envs/anaCogent
export PATH="/NJPROJ2/PB/pipeline/Pacbio_Isoseq_noref_V3.0/software/cd-hit-v4.6.8-2017-0621/:$PATH"

cd /NJPROJ1/PB/personal_dir/liuchuanying/Angel/Gallus_chicken/cd-hit_angel_train
/NJPROJ1/PB/personal_dir/liuchuanying/software/Miniconda2/envs/anaCogent/bin/angel_train.py /NJPROJ1/PB/personal_dir/liuchuanying/Angel/Gallus_chicken/cd-hit_angel_train/Gallus.dumb.final.training.cds /NJPROJ1/PB/personal_dir/liuchuanying/Angel/Gallus_chicken/cd-hit_angel_train/Gallus.dumb.final.training.utr Gallus_chicken.classifier.pickle --cpus 12

The program is delivered multiple times and is stuck in "Done with records".And there is no shortage of memory.

So, I want to know the reason why I cannot get the pickle file.

Thanks a lot.

Magdoll commented 5 years ago

Hi @chuanyingliu ,

How many sequences were input for training?

Could you try running with a smaller input training set?

-Liz

chuanyingliu commented 5 years ago

Hi @Magdoll I followed the process described by angel software. First, I used the rna sequence to perform Dumb ORF prediction to get more than 40,000 ORFs, and then de-redundant them. So I trained pickle with a sequence of 500 cds generated by angel_make_training_set.py. Why does the program only select 500 sequences for training?

Magdoll commented 5 years ago

It's not really necessary to train with more sequences. Could you try training with 200-250 and see if it goes through?

chuanyingliu commented 5 years ago

@Magdoll Thank you very much for your advice, I have trained to get a pickle file.

But I still have a few questions: Why can I train a pickle file with a few (20-300) cds sequences, what is the difference? And the size of the pickle file is very similar, only more than 100 k, what information is recorded in this file? What is the principle of pickle file training?

I hope to get the answer from you, thank you very much.

Magdoll commented 4 years ago

ANGEL has been updated to v3.0. If problem persists after updating, please re-open or file a new issue.