bergmanlab / mcclintock

Meta-pipeline to identify transposable element insertions using next generation sequencing data
92 stars 30 forks source link

RepeatMasker error #107

Closed ebhmayra closed 1 year ago

ebhmayra commented 1 year ago

Dear all,

We have installed mcclintock-2.0.0 in our cluster and we ran it with the required options:

#!/bin/bash
#SBATCH --job-name=macclintock2
#SBATCH --cpus-per-task=1
#SBATCH --mem=16GB
#SBATCH --time=00-12:00:00
#SBATCH --mail-type=ALL
#SBATCH --output=/scratch/botany/mayra/diospyros/macclintock.out
#SBATCH --output=/scratch/botany/mayra/diospyros/macclintock.err
#SBATCH --partition=basic

module load conda 
conda activate mcclintock-2.0.0

data_folder=/scratch/botany/mayra/diospyros
mkdir $data_folder/mcclintok_cherrieriBT262
cd $data_folder/fastqc_trim
mcclintock.py -r /scratch/botany/katie/assembled_genomes/assemblies/impolita/working/impolita.fasta -c $data_folder/impolita-families_mcclintok5.fa -1 $data_folder/fastqc_trim/cherrieriBT262_R1_val_1.fq.gz -2 $data_folder/fastqc_trim/cherrieriBT262_R2 -o $data_folder/mcclintok_cherrieriBT262

But we get the following error:

RepeatMasker -pa 1 -lib /scratch/botany/mayra/diospyros/mcclintok_cherrieriBT262/impolita/consensus_fasta/consensusTEs.fasta -dir /scratch/botany/mayra/diospyros/mcclintok_cherrieriBT262/cherrieriBT262_R1_val_1//tmp/repeatmasker -s -nolow -no_is /scratch/botany/mayra/diospyros/mcclintok_cherrieriBT262/impolita/genome_fasta/impolita.fasta
PROCESSING       Running RepeatMasker &> /scratch/botany/mayra/diospyros/mcclintok_cherrieriBT262/logs/20230308.200255.8345204/processing.log
can't find Repeatmasker output in:/scratch/botany/mayra/diospyros/mcclintok_cherrieriBT262/cherrieriBT262_R1_val_1//tmp/repeatmasker

[Thu Mar  9 08:35:55 2023]
Error in rule repeatmask:
    jobid: 33
    output: /scratch/botany/mayra/diospyros/mcclintok_cherrieriBT262/cherrieriBT262_R1_val_1/intermediate/impolita.repeatmasker.out
    conda-env: /home/apps/conda/miniconda3/envs/mcclintock-2.0.0/share/mcclintock/install/envs/conda/f760bd75

In the processing log file, it says:

RepeatMasker version open-4.0.7
Search Engine: NCBI/RMBLAST [ 2.11.0+ ]
Rebuilding RepeatMaskerLib.embl library
  - Read in 216 sequences from /home/apps/conda/miniconda3/envs/mcclintock-2.0.0/share/mcclintock/install/envs/conda/f760bd75/share/RepeatMasker/Libraries/DfamConsensus.embl
  Saving RepeatMaskerLib.embl library...() Unable to open file /home/apps/conda/miniconda3/envs/mcclintock-2.0.0/share/mcclintock/install/envs/conda/f760bd75/share/RepeatMasker/Libraries/RepeatMaskerLib.embl for writing: Permission denied
RepeatMasker -pa 1 -lib /scratch/botany/mayra/diospyros/mcclintok_cherrieriBT262/impolita/consensus_fasta/consensusTEs.fasta -dir /scratch/botany/mayra/diospyros/mcclintok_cherrieriBT262/cherrieriBT262_R1_val_1//tmp/repeatmasker -s -nolow -no_is /scratch/botany/mayra/diospyros/mcclintok_cherrieriBT262/impolita/genome_fasta/impolita.fasta
RepeatMasker -pa 1 -lib /scratch/botany/mayra/diospyros/mcclintok_cherrieriBT262/impolita/consensus_fasta/consensusTEs.fasta -dir /scratch/botany/mayra/diospyros/mcclintok_cherrieriBT262/cherrieriBT262_R1_val_1//tmp/repeatmasker -s -nolow -no_is /scratch/botany/mayra/diospyros/mcclintok_cherrieriBT262/impolita/genome_fasta/impolita.fasta

Apparently the error is related to //tmp/repeatmasker, but we don't understand why the permission to save the RepeatMaskerLib.embl library was denied. Hope you can help us with this issue.

Best, Mayra

ebhmayra commented 1 year ago

Hello,

We were trying to run Mcclintock again, and we still have the same error from RepeatMasker, although we provided the -g gff and -t tsv file, mcclintock still want to use RepeatMasker. This is what the processing.log file says:

RepeatMasker version open-4.0.7
Search Engine: NCBI/RMBLAST [ 2.11.0+ ]
Rebuilding RepeatMaskerLib.embl library
    Reading Dfam_consensus database...^M  - Read in 216 sequences from /home/apps/conda/miniconda3/envs/mcclintock-2.0.0/share/mcclintock/install/envs/conda/f760bd75/share/RepeatMasker/Libraries/DfamConsensus.embl
  Saving RepeatMaskerLib.embl library...() Unable to open file /home/apps/conda/miniconda3/envs/mcclintock-2.0.0/share/mcclintock/install/envs/conda/f760bd75/share/RepeatMasker/Libraries/RepeatMaskerLib.embl for writing: Permission denied

RepeatMasker -pa 7 -lib /scratch/botany/mayra/diospyros/mcclintok_gt_viellardiiBT023_3/impolita/consensus_fasta/consensusTEs.fasta -dir /scratch/botany/mayra/diospyros/mcclintok_gt_viellardiiBT023_3/vieillardiiBT023_R1_val_1//tmp/repeatmasker -s -nolow -no_is /scratch/botany/mayra/diospyros/mcclintok_gt_viellardiiBT023_3/impolita/genome_fasta/impolita.fasta
RepeatMasker -pa 7 -lib /scratch/botany/mayra/diospyros/mcclintok_gt_viellardiiBT023_3/impolita/consensus_fasta/consensusTEs.fasta -dir /scratch/botany/mayra/diospyros/mcclintok_gt_viellardiiBT023_3/vieillardiiBT023_R1_val_1//tmp/repeatmasker -s -nolow -no_is /scratch/botany/mayra/diospyros/mcclintok_gt_viellardiiBT023_3/impolita/genome_fasta/impolita.fasta

Best, Mayra

cbergman commented 1 year ago

Hi @ebhmayra

Thanks for reporting this issue. It appears that your installation of McClintock is asking to write files in a location that your user account doesn't have permission to access. My suspicion is that you had a system administrator install McClintock inside a system-wide miniconda directory that is read-only for your user, and that Repeatmasker is trying to write temporary files in that directory.

To help troubleshoot this issue, can you provide more information on how you installed McClintock (i.e. did you follow the instructions exactly as in the README: https://github.com/bergmanlab/mcclintock#dependency)? Also, can you confirm that you successfully ran the test data as detailed here: https://github.com/bergmanlab/mcclintock#examples)?

If not, can you try installing McClintock into your user account home directory following the instructions in the readme (https://github.com/bergmanlab/mcclintock#dependency) and run the test dataset (https://github.com/bergmanlab/mcclintock#examples). If these steps are successful, I would then try launching your job again using the fresh installation in your home directory. If the Repeatmasker problem still exists using the McClintock install in your homedir, then please report back here.

Thanks, Casey

ebhmayra commented 1 year ago

Hello Casey,

Thank you for your response. Mcclintock was centrally installed by the cluster administrator, and they followed the same installation instructions as we shared the github manual with them. In our cluster Mcclintock is installed within a conda environment. We know that they couldn't run the test data because there was a Repeat Masker error but we did not get further details.

Nevertheles, we tried to install it locally, we followed the specific instructions and we got an error while creating the Mcclintock conda environment:

mamba env create -f install/envs/mcclintock.yml --name mcclintock

error    libmamba Could not open lockfile '/home/apps/conda/miniconda3/pkgs/cache/cache.lock'
warning  libmamba Could not parse state file: Could not load cache state: [json.exception.type_error.302] type must be string, but is null

bioconda/noarch                                      4.2MB @   5.0MB/s  0.9s
bioconda/linux-64                                    4.6MB @   4.7MB/s  1.0s
pkgs/r/noarch                                        1.3MB @   1.1MB/s  0.4s
pkgs/main/linux-64                                   5.3MB @   4.1MB/s  1.3s
pkgs/r/linux-64                                      1.4MB @   1.0MB/s  0.4s
pkgs/main/noarch                                   819.7kB @ 588.9kB/s  0.2s
conda-forge/noarch                                  11.5MB @   5.2MB/s  2.3s
conda-forge/linux-64                                30.2MB @   5.3MB/s  6.0s

Looking for: ['mamba=0.21.2', 'python=3.8.2', 'snakemake=5.32.0', 'biopython=1.77', 'git=2.23.0', 'unzip=6.0', 'patch=2.7.6', 'wgsim', 'seqtk', 'samtools', 'bedtools', 'art']

warning  libmamba Extracted package cache '/home/apps/conda/miniconda3/pkgs/libxml2-2.9.14-h74e7548_0' has invalid size
warning  libmamba Extracted package cache '/home/apps/conda/miniconda3/pkgs/libxml2-2.9.14-h74e7548_0' has invalid SHA-256 checksum

Then, when we installed the component methods, we got the following errors

python3 mcclintock.py --install

INSTALL          Installing scripts for:relocate
CreateCondaEnvironmentException:
Could not create conda environment from /home/user/bricenohuayta/mcclintock/install/envs/mcc_relocate.yml:
error    libmamba Could not open lockfile '/home/apps/conda/miniconda3/pkgs/cache/cache.lock'
error    libmamba Could not open lockfile '/home/apps/conda/miniconda3/pkgs/cache/cache.lock'
error    libmamba Could not open lockfile '/home/apps/conda/miniconda3/pkgs/cache/cache.lock'
error    libmamba Could not open lockfile '/home/apps/conda/miniconda3/pkgs/cache/cache.lock'
Could not solve for environment specs
Encountered problems while solving:
  - package perl-lwp-protocol-https-6.06-1 requires openssl >=1.1.0,<=1.1.1, but none of the providers can be installed

The environment can't be solved, aborting the operation

  File "/home/user/bricenohuayta/.conda/envs/mcclintock/lib/python3.8/site-packages/snakemake/deployment/conda.py", line 350, in create
snakemake --use-conda --conda-frontend=mamba --conda-prefix /home/user/bricenohuayta/mcclintock/install//envs/conda --configfile /home/user/bricenohuayta/mcclintock/install//config.json --cores 1 --nolock /home/user/bricenohuayta/mcclintock/install/tools/relocate/scripts/relocaTE_insertionFinder.pl --quiet

INSTALL          Installing scripts for:temp
CreateCondaEnvironmentException:
Could not create conda environment from /home/user/bricenohuayta/mcclintock/install/envs/mcc_temp.yml:
error    libmamba Could not open lockfile '/home/apps/conda/miniconda3/pkgs/cache/cache.lock'
error    libmamba Could not open lockfile '/home/apps/conda/miniconda3/pkgs/cache/cache.lock'
error    libmamba Could not open lockfile '/home/apps/conda/miniconda3/pkgs/cache/cache.lock'
error    libmamba Could not open lockfile '/home/apps/conda/miniconda3/pkgs/cache/cache.lock'
Could not solve for environment specs
Encountered problems while solving:
  - package perl-lwp-protocol-https-6.06-1 requires openssl >=1.1.0,<=1.1.1, but none of the providers can be installed

The environment can't be solved, aborting the operation

  File "/home/user/bricenohuayta/.conda/envs/mcclintock/lib/python3.8/site-packages/snakemake/deployment/conda.py", line 350, in create
snakemake --use-conda --conda-frontend=mamba --conda-prefix /home/user/bricenohuayta/mcclintock/install//envs/conda --configfile /home/user/bricenohuayta/mcclintock/install//config.json --cores 1 --nolock /home/user/bricenohuayta/mcclintock/install/tools/temp/scripts/TEMP_Insertion.sh --quiet

Finally, we tried to run Mcclintock with the test dataset, and we get an error because of "relocate"

python3 mcclintock.py -r test/sacCer2.fasta -c test/sac_cer_TE_seqs.fasta -g test/reference_TE_locations.gff -t test/sac_cer_te_families.tsv -1 test/SRR800842_1.fastq.gz -2 test/SRR800842_2.fastq.gz -p 4 -o /home/user/bricenohuayta/mcclintock

Traceback (most recent call last):
  File "mcclintock.py", line 986, in <module>
    main()
  File "mcclintock.py", line 36, in main
    check_installed_modules(args.methods, sysconfig.NO_INSTALL_METHODS, config_install.MD5, os.path.dirname(os.path.abspath(__file__))+"/install/")
  File "mcclintock.py", line 518, in check_installed_modules
    if installed_version[method] != method_md5s[method]:
KeyError: 'relocate'

Do you know what may be the reasons of libmamba's errors and the following relocate?

All the best,

Mayra

cbergman commented 1 year ago

I believe you are using a system-wide installation of conda, which is trying to install mamba into the /home/apps/conda/miniconda3/ directory which you do not have permission to write into. Can you please execute which conda and report back the results?

If you are using a system-wide version of conda, I suggest you do a local installation of miniconda into your home directory, which will allow you to follow the installation instructions more exactly: https://github.com/bergmanlab/mcclintock#dependency

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O $HOME//miniconda.sh
bash ~/miniconda.sh -b -p $HOME/miniconda # silent mode
echo "export PATH=\$PATH:\$HOME/miniconda/bin" >> $HOME/.bashrc # add to .bashrc
source $HOME/.bashrc
conda init
# logout and log back in
conda update -y conda
conda install -c conda-forge mamba
ebhmayra commented 1 year ago

Hi,

conda version : 23.1.0 I'll install miniconda.

Thanks, Mayra

cbergman commented 1 year ago
ebhmayra commented 1 year ago

With which conda, I get the following:

conda ()
{ 
    \local cmd="${1-__missing__}";
    case "$cmd" in 
        activate | deactivate)
            __conda_activate "$@"
        ;;
        install | update | upgrade | remove | uninstall)
            __conda_exe "$@" || \return;
            __conda_reactivate
        ;;
        *)
            __conda_exe "$@"
        ;;
    esac
}
cbergman commented 1 year ago

Sorry, I assumed you were using bash, but it looks like you are using zsh. Can you report back the full command line response from whence -p conda?

ebhmayra commented 1 year ago

Hello,

whence -p conda says:

-bash: whence: command not found

Best, Mayra

ebhmayra commented 1 year ago

Dear @cbergman

When we installed miniconda it goes fine until we called conda init, then we get the error below.

For some reason is calling the conda we have already installed in the cluster. When you look at the environmental variables such as conda_prefix, conda_root, etc, all of them show the path for the other conda/miniconda already installed. It's the same for active env location, but also we have 2 env directories ( /home/user/bricenohuayta/.conda/envs, and /home/apps/conda/miniconda3/envs) How can I use or set up for one of those directories or call it when I use conda init command?

Best, Mayra

[sudo] password for bricenohuayta: 

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File "/home/apps/conda/miniconda3/lib/python3.9/site-packages/conda/exceptions.py", line 1124, in __call__
        return func(*args, **kwargs)
      File "/home/apps/conda/miniconda3/lib/python3.9/site-packages/conda/cli/main.py", line 69, in main_subshell
        exit_code = do_call(args, p)
      File "/home/apps/conda/miniconda3/lib/python3.9/site-packages/conda/cli/conda_argparse.py", line 91, in do_call
        return getattr(module, func_name)(args, parser)
      File "/home/apps/conda/miniconda3/lib/python3.9/site-packages/conda/cli/main_init.py", line 33, in execute
        return initialize(context.conda_prefix, selected_shells, for_user, args.system,
      File "/home/apps/conda/miniconda3/lib/python3.9/site-packages/conda/core/initialize.py", line 118, in initialize
        run_plan_elevated(plan2)
      File "/home/apps/conda/miniconda3/lib/python3.9/site-packages/conda/core/initialize.py", line 714, in run_plan_elevated
        result = subprocess_call(
      File "/home/apps/conda/miniconda3/lib/python3.9/site-packages/conda/gateways/subprocess.py", line 98, in subprocess_call
        stdout, stderr = process.communicate(input=stdin)
      File "/home/apps/conda/miniconda3/lib/python3.9/subprocess.py", line 1134, in communicate
        stdout, stderr = self._communicate(input, endtime, timeout)
      File "/home/apps/conda/miniconda3/lib/python3.9/subprocess.py", line 1959, in _communicate
        input_view = memoryview(self._input)
    TypeError: memoryview: a bytes-like object is required, not 'str'

`$ /home/apps/conda/miniconda3/bin/conda.conda init`

  environment variables:
                 CIO_TEST=<not set>
        CONDA_DEFAULT_ENV=base
                CONDA_EXE=/home/apps/conda/miniconda3/bin/conda
             CONDA_PREFIX=/home/apps/conda/miniconda3
    CONDA_PROMPT_MODIFIER=
         CONDA_PYTHON_EXE=/home/apps/conda/miniconda3/bin/python
               CONDA_ROOT=/home/apps/conda/miniconda3
              CONDA_SHLVL=1
           CURL_CA_BUNDLE=<not set>
          LD_LIBRARY_PATH=/home/apps/lua/5.4.4/lib:/home/apps/slurm/23.02.0/lib/slurm:/home/apps
                          /slurm/23.02.0/lib
               LD_PRELOAD=<not set>
                  MANPATH=/home/apps/lua/5.4.4/man::/home/apps/slurm/23.02.0/share/man
               MODULEPATH=/etc/scl/modulefiles:/home/apps/modulefiles/visualisation:/home/apps/m
                          odulefiles/system:/home/apps/modulefiles/sequenceanalysis:/home/apps/m
                          odulefiles/rnatools:/home/apps/modulefiles/proteomics:/home/apps/modul
                          efiles/phylogeny:/home/apps/modulefiles/ngstools:/home/apps/modulefile
                          s/networktools:/home/apps/modulefiles/metagenomics:/home/apps/modulefi
                          les/machinelearning:/home/apps/modulefiles/genetics:/home/apps/modulef
                          iles/development:/home/apps/modulefiles/assembly:/home/apps/modulefile
                          s/amplicons:/home/apps/modulefiles
                     PATH=/home/apps/conda/miniconda3/bin:/home/apps/conda/miniconda3/condabin:/
                          home/apps/lisc/default/bin:/home/apps/lua/5.4.4/bin:/home/apps/slurm/2
                          3.02.0/bin:/home/apps/module/5.2.0/bin:/usr/local/bin:/usr/bin:/usr/lo
                          cal/sbin:/usr/sbin:/home/user/bricenohuayta/miniconda/bin:/home/user/b
                          ricenohuayta/miniconda/bin:/home/user/bricenohuayta/miniconda/bin:/hom
                          e/user/bricenohuayta/miniconda/bin:/home/user/bricenohuayta/miniconda/
                          bin:/home/user/bricenohuayta/bin
       REQUESTS_CA_BUNDLE=<not set>
            SSL_CERT_FILE=<not set>
  __MODULES_SHARE_MANPATH=:2

     active environment : base
    active env location : /home/apps/conda/miniconda3
            shell level : 1
       user config file : /home/user/bricenohuayta/.condarc
 populated config files : /home/user/bricenohuayta/.condarc
          conda version : 23.1.0
    conda-build version : not installed
         python version : 3.9.16.final.0
       virtual packages : __archspec=1=x86_64
                          __cuda=12.1=0
                          __glibc=2.28=0
                          __linux=4.18.0=0
                          __unix=0=0
       base environment : /home/apps/conda/miniconda3  (read only)
      conda av data dir : /home/apps/conda/miniconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/apps/conda/miniconda3/pkgs
                          /home/user/bricenohuayta/.conda/pkgs
       envs directories : /home/user/bricenohuayta/.conda/envs
                          /home/apps/conda/miniconda3/envs
               platform : linux-64
             user-agent : conda/23.1.0 requests/2.28.2 CPython/3.9.16 Linux/4.18.0-425.13.1.el8_7.x86_64 ol/8.7 glibc/2.28
                UID:GID : 11514:10000
             netrc file : None
           offline mode : False

An unexpected error has occurred. Conda has prepared the above report.

If submitted, this report will be used by core maintainers to improve
future releases of conda.
Would you like conda to send this report to the core maintainers? [y/N]: 
No report sent. To permanently opt-out, use

    $ conda config --set report_errors false
cbergman commented 1 year ago

I agree that the problems you have been having are related to how conda has been installed system wide on your machine. The original errors related to RepeatMasker were arising because the McClintock system was installed in a non-writable directory in the system-wide conda installation. The most current problem with conda init is because your default user environment is still using the system-wide conda, when it should be using the version of conda used in your home direcotory.

To help troubleshoot what is going on with your machine could you report back the full command line responses to the following two commands lsb_release -a and echo $PATH.

In advanced of knowing the responses to these commands, I suspect that editing the order of the modification of your PATH variable in your .bashrc should allow you to use the local version of conda, i.e.

1) delete export PATH=\$PATH:\$HOME/miniconda/bin from .bashrc

2) change order of PATH variable modification in .bashrc, then execute the commands in your .bashrc, then run conda init

echo "export PATH=\$HOME/miniconda/bin:\$PATH" >> $HOME/.bashrc # add to .bashrc
source $HOME/.bashrc
conda init
ebhmayra commented 1 year ago

Thank you for your response.

Here the output of the two commands:

lsb_release -a

LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: OracleServer
Description:    Oracle Linux Server release 8.7
Release:    8.7
Codename:   n/a
echo $PATH

/home/apps/conda/miniconda3/bin:/home/apps/conda/miniconda3/condabin:/home/apps/lisc/default/bin:/home/apps/lua/5.4.4/bin:/home/apps/slurm/23.02.0/bin:/home/apps/module/5.2.0/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/user/bricenohuayta/miniconda/bin:/home/user/bricenohuayta/miniconda/bin:/home/user/bricenohuayta/miniconda/bin:/home/user/bricenohuayta/miniconda/bin:/home/user/bricenohuayta/miniconda/bin:/home/user/bricenohuayta/bin

Mayra

cbergman commented 1 year ago

From the results of your echo $PATH command, you can see that your system-wide conda is being picked up before the conda installed in your user directory (i.e. /home/apps/conda/miniconda3/bin is listed before /home/user/bricenohuayta/miniconda/bin). Please follow the steps I recommended in https://github.com/bergmanlab/mcclintock/issues/107#issuecomment-1470114896 to modify your .bashrc so that the conda installed in your user directory is being picked up before your system-wide conda. Hopefully this should allow you to run conda init sucessfully.

ebhmayra commented 1 year ago

Hi @cbergman

We fixed the issue by installing Mcclintock again in the cluster. It needs to run the test files once as user appadmin, which created the database indexes. So, the testing dataset was successful.

Thank you very much for your support. Best, Mayra