KennthShang / PhaGCN2.0

26 stars 10 forks source link

ModuleNotFoundError: No module named 'sklearn' #3

Closed MoriatyZy closed 2 months ago

MoriatyZy commented 1 year ago

Description of bug

Hello,

I used PhaGNC2 to classify the viral contigs, but it stopped after running for 20 hours. The log file was very long and there were several errors in it as follows:

_Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library. Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it. Namespace(database='Caudoviridae', dropout=0, epochs=200, hidden=64, learning_rate=0.01, max_degree=3, model='gcn', weight_decay=0.0005) Traceback (most recent call last): File "run_GCN.py", line 12, in from sklearn.metrics import classification_report ModuleNotFoundError: No module named 'sklearn' rm: cannot remove ‘validation/’: No such file or directory rm: cannot remove ‘stride50_val/’: No such file or directory rm: cannot remove ‘int_val/’: No such file or directory rm: cannot remove ‘filtered_val/’: No such file or directory rm: cannot remove ‘dataset/’: No such file or directory rm: cannot remove ‘split_long_reads_val/’: No such file or directory_

_Running Diamond... GCN Error for file contig_0 GCN Error for file contig_1 GCN Error for file contig_2 GCN Error for file contig_3 GCN Error for file contig_4 GCN Error for file contig_5 GCN Error for file contig_6 GCN Error for file contig_7 GCN Error for file contig_8 GCN Error for file contig9 ……

I have tried to type in the command _export MKL_SERVICE_FORCEINTEL=1 before runing run_Speed_up.py, but it didn't work.

Besides, I would like to know how to specify the path to the output file. If I use the following command, the output files will appear in the installation packages folder (PhaGCN2.0). $ python run_Speed_up.py --contigs contigs.fa --len 8000

file-plot

Please help me troubleshoot. Much appreciated!

Thank you,

-Nora

log

PhaGCN.txt

Operating System

Linux 3.10.0-1062.el7.x86_64

KennthShang commented 1 year ago

Hi,

For the first problem:

Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.

This is a system error.

One easy way to fix this is to revise the header in run_GCN.py as below:

import  numpy as np
from    data import load_data, preprocess_features, preprocess_adj, sample_mask
import  model
from    config import  args
from    utils import masked_loss, masked_acc
import  pickle as pkl
import  scipy.sparse as sp
import  torch
from    torch import nn
from    torch import optim
from    torch.nn import functional as F

Or you can download the latest run_GCN.py. I have updated them accordingly.

For the second problem:

ModuleNotFoundError: No module named 'sklearn'

Please try to install sklearn using

pip install scikit-learn

Or following the above suggestion (it will remove the sklearn header)

Best, Jiayu

MoriatyZy commented 1 year ago

Thank you for your prompt and useful reply! I have solved the problem according to your suggestion. Also, I would like to know if there is any command to specify the path to the output file? Because I want to run multiple phagcn2 jobs at the same time, this will cause the output file to be overwritten.

KennthShang commented 1 year ago

If you are only solving the phage-related task, PhaBOX will help.

However, I am only a collaborator of PhaGCN2 and the main manager is @yuanwenguang666. I am not sure whether he has time to upgrade the program.

As far as I know, I am afraid the current version does not have such a parameter for the user to specify the output path.

Best, Jiayu

yuanwenguang666 commented 1 year ago

I apologize for noticing this problem late.

The first issue has been fixed in the update of 18 February 2023. You can update it by cloning new code.

For the second question, as the software is run in batches, the current version does not have such a parameter for the user to specify the output path. But if you want to run multiple phagcn2 jobs, we recommend that you copy PhaGCN2's folder to predict multiple sequence files simultaneously. (ps: If you run too many at the same time, the memory may be insufficient. Please run it properly.)

for example: cp -r PhaGCN2.0 ~/PhaGCN2.1/ python3 PhaGCN2.0/run_Speed_up.py --contigs total1.fasta --len 2000 python3 PhaGCN2.1/run_Speed_up.py --contigs total2.fasta --len 2000

Thank you for your question.

All the best, Wen-Guang