Tutorial Error #22

Closed jkalleberg closed 1 year ago

jkalleberg commented 1 year ago


I'm sorry in advance for the wall of text; however, I've repeatedly tried to build the environment without success and wanted to be thorough for reproducibility. I would greatly appreciate any help!

To avoid using long local paths, use these paths as examples: I pulled the git repo to /root/path/cue and ran the tutorial at WORKING_DIR=/root/path/project_dirname.

I am getting the following error while trying to run the tutorial:

Output (edited to have ${WORKING_DIR} instead of the long absolute path):

*  cue (v0.2.2): discovery mode *
[INFO]  ========== Model config ==========
        model_path: ${WORKING_DIR}/data/demo/models/cue.pt
        n_cpus: 1
        gpu_ids: []
        batch_size: 16
        logging_level: INFO
        report_interval: 50
        n_jobs_per_gpu: 1
        signal_set: SV_SIGNAL_SET.SHORT
        class_set: SV_CLASS_SET.BASIC5ZYG
        num_keypoints: 1
        model_architecture: HG
        image_dim: 256
        sigma: 10
        stride: 4
        heatmap_peak_threshold: 0.4
        pretrained_refinenn_path: None
        config_file: ${WORKING_DIR}/data/demo/results/model.yaml
        experiment_dir: ${WORKING_DIR}/data/demo/results
        devices: [device(type='cpu')]
        device: cpu
        log_dir: ${WORKING_DIR}/data/demo/results/logs/
        report_dir: ${WORKING_DIR}/data/demo/results/reports/
        log_file: ${WORKING_DIR}/data/demo/results/logs/main.log
        classes: ['NEG', 'DEL-HOM', 'INV-HOM', 'DUP-HOM', 'DEL-HET', 'INV-HET', 'DUP-HET', 'IDUP-HOM', 'IDUP-HET']
        num_classes: 9
        n_signals: 6
[INFO] ========== Data config =========
        bam: ${WORKING_DIR}/data/demo/inputs/chr21.small.bam
        fai: ${WORKING_DIR}/data/demo/inputs/GRCh38.fa.fai
        chr_names: ['chr21']
        n_cpus: 1
        logging_level: INFO
        min_refine_buffer: 2000
        refine_buffer_frac_size: 5
        refine_pair_dist_frac_size: 2
        refine_bp_kernels: [0, 50, 500]
        refine_min_support: 2
        refine_disable: False
        min_pair_support: 2
        min_pair_distance: 4000
        max_pair_distance: 1000000
        scan_target_intervals: True
        stream: True
        view_mode: False
        store_img: False
        empty_annotation: False
        bins_per_block: 8000
        min_sv_len: 4000
        min_qual_score: 50
        bam_type: BAM_TYPE.SHORT
        signal_set: SV_SIGNAL_SET.SHORT
        signal_set_origin: SHORT
        bed: None
        blacklist_bed: None
        signal_vmax: {'RD': 600, 'RD_LOW': 800, 'RD_CLIPPED': 600, 'SM': 200, 'SR_RP': 600, 'LR': 600, 'LLRR': 100, 'RL': 100, 'LLRR_VS_LR': 1}
        signal_mapq: {'RD': 20, 'RD_LOW': 0, 'RD_CLIPPED': 20, 'SM': 20, 'SR_RP': 0, 'LR': 0, 'LLRR': 1, 'RL': 1, 'LLRR_VS_LR': 1}
        bin_size: 750
        interval_size: 150000
        step_size: 50000
        shift_size: None
        heatmap_dim: 1000
        image_dim: 256
        class_set: SV_CLASS_SET.BASIC5ZYG
        num_keypoints: 1
        bbox_padding: 0
        config_file: ${WORKING_DIR}/data/demo/results/data.yaml
        dataset_dir: ${WORKING_DIR}/data/demo/results
        info_dir: ${WORKING_DIR}/data/demo/results/info/
        image_dir: ${WORKING_DIR}/data/demo/results/images/
        annotation_dir: ${WORKING_DIR}/data/demo/results/annotations/
        classes: ['NEG', 'DEL-HOM', 'INV-HOM', 'DUP-HOM', 'DEL-HET', 'INV-HET', 'DUP-HET', 'IDUP-HOM', 'IDUP-HET']
        num_classes: 9
        num_signals: 6
        uid: 0000000000
        log_file: ${WORKING_DIR}/data/demo/results/info/main.log
[INFO] Running on 1 CPUs/GPUs
[INFO] Chromosome lists processed by each process: [array(['chr21'], dtype='<U5')]
Traceback (most recent call last):
  File "../cue/engine/call.py", line 105, in <module>
    delayed(call)(config.devices[i], chr_name_chunks[i], scan_id) for i in range(n_procs))
  File "${WORKING_DIR}/miniconda_envs/dev/lib/python3.7/site-packages/joblib/parallel.py", line 1855, in __call__
    return output if self.return_generator else list(output)
  File "${WORKING_DIR}/miniconda_envs/dev/lib/python3.7/site-packages/joblib/parallel.py", line 1784, in _get_sequential_output
    res = func(*args, **kwargs)
  File "../cue/engine/call.py", line 73, in call  
    model.load_state_dict(torch.load(config.model_path, device))
  File "${WORKING_DIR}/miniconda_envs/dev/lib/python3.7/site-packages/torch/serialization.py", line 593, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "${WORKING_DIR}/miniconda_envs/dev/lib/python3.7/site-packages/torch/serialization.py", line 763, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'. 

Tutorial Bash Script

# scripts/setup/tutorial.sh

echo -e "=== scripts/setup/tutorial.sh > start $(date)"

# add directories if missing
mkdir -p data/demo/inputs
mkdir -p data/demo/ground_truth
mkdir -p data/demo/models
mkdir -p data/demo/results

# copy tutorial raw data files

declare -a FILES=("inputs/chr21.small.bam" "inputs/chr21.small.bam.bai" "inputs/GRCh38.fa.fai" "ground_truth/svs.chr21.small.sorted.vcf.gz" "ground_truth/svs.chr21.small.sorted.vcf.gz.tbi" "models/cue.pt")

for f in ${FILES[@]}; do
    if [ ! -f ./data/demo/${f} ]; then
        echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: downloading a file now... | '${CUE_DIR}${f}'"
        curl -s --continue-at - ${CUE_DIR}/$f -o ./data/demo/$f
        echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: file found | './data/demo/${f}'"

# copy configuration files for demo
cp ../cue/data/demo/config/data.yaml data/demo/results/.
cp ../cue/data/demo/config/model.yaml data/demo/results/.

export WORKING_DIR=$(pwd)
## THEN, manually edit copied files to replace '..' with 'WORKING_DIR' path 

# Activate the environment -----------------
conda activate ./miniconda_envs/dev/

# Adding path to Cue -----------------------
ROOT_DIR=$(dirname $(pwd))

python ../cue/engine/call.py --data_config ${WORKING_DIR}/data/demo/results/data.yaml --model_config ${WORKING_DIR}/data/demo/results/model.yaml

I initially assumed my package/dependency versions were introducing a syntax error. However, being explicit using version numbers from the requirements.txt was unsuccessful. Instead, I built the conda env repeatedly, landing on success by specifying Python3.7 and torch v0.6 using the following bash script, but I still get the error:

# scripts/setup/build_env.sh

echo -e "=== scripts/setup/build_env.sh > start $(date)"

##--- NOTE: ----##
##  You must have an interactive session
##  with more mem than defaults to work!

if [ ! -d ./miniconda_envs/dev ] ; then
     # If missing an enviornment called "dev", 
     # initalize this env with only the anaconda package 
     conda create --yes --prefix ./miniconda_envs/dev

# Then, activate the base environment to enable 'conda activate'
source ${CONDA_BASE}/etc/profile.d/conda.sh
conda deactivate

##--- Configure an environment-specific .condarc file ---##
## NOTE: Only performed once:
# Changes the (env) prompt to avoid printing the full path
conda config --env --set env_prompt '({name})'

# Put the package download channels in a specific order
conda config --env --add channels defaults
conda config --env --add channels bioconda
conda config --env --add channels conda-forge

# Download packages flexibly
conda config --env --set channel_priority flexible

# Install the project-specific packages in the env
conda install -p ./miniconda_envs/dev -y pip python=3.7 numpy scipy pyfaidx pysam pytabix truvari opencv

conda install -p ./miniconda_envs/dev -y bitarray cachetools intervaltree joblib matplotlib pycocotools python-dateutil pyyaml seaborn

conda install -p ./miniconda_envs/dev -y python-dotenv regex natsort mkdocs mkdocs-material black 

# using a private channel to obtain outdated versions required by Cue
conda install -p ./miniconda_envs/dev -y -c zeus1942 torchvision=0.6 pytorch

###===== Notes about specific packages =====###
### Python = Cue docs specify Python3.7
### Scipy = scientific libraries for Python
### DotEnv = enables environment variable configuration across bash and python
### Regex = required for update regular expression handling
### Natsort = enables sorting of file iterators
### Mkdocs & Mkdocs-Material = used for writing Github documentation
### Black = Python formatter
### REMAINING PACKAGES ARE CUE REQUIREMENTS OBTAINED FROM | https://github.com/PopicLab/cue/blob/551cb72f7b9f4177c9ac71743f8ab7cf3d7f28dc/install/requirements.txt

echo -e "=== scripts/setup/build_env.sh > end $(date)"

My conda environment includes the following:

I also attempted to install packages using pip within a fresh conda env using the following:

if [ ! -d ./miniconda_envs/dev ] ; then
     # If missing an enviornment called "dev", 
     # initalize this env with only the anaconda package 
     conda create --yes --prefix ./miniconda_envs/dev

# Then, activate the base environment to enable 'conda activate'
source ${CONDA_BASE}/etc/profile.d/conda.sh
conda deactivate

##--- Configure an environment-specific .condarc file ---##
## NOTE: Only performed once:
# Changes the (env) prompt to avoid printing the full path
conda config --env --set env_prompt '({name})'

# Put the package download channels in a specific order
conda config --env --add channels defaults
conda config --env --add channels bioconda
conda config --env --add channels conda-forge

# Download packages flexibly
conda config --env --set channel_priority flexible

# Install the project-specific packages in the env
conda install -p ./miniconda_envs/dev -y pip python=3.7

# activate env
. scripts/start_dev.sh

pip install cython

pip --no-cache-dir install -r ../cue/install/requirements.txt

But got the following error:

${WORKING_DIR}/miniconda_envs/dev/lib/python3.7/site-packages/setuptools/config/setupcfg.py:293: _DeprecatedConfig: Deprecated config in `

              The license_file parameter is deprecated, use license_files instead.

              By 2023-Oct-30, you need to update your project and remove deprecated calls
              or your builds will no longer be supported.

              See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details.
AttributeError: cython_sources
      [end of output]

 note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
jkalleberg commented 1 year ago

I think I figured it out and wanted to share it with anyone else dealing with this in the future. My conda env was fine, the demo files downloaded with curl were not. I re-downloaded them using wget via


declare -a FILES=("inputs/chr21.small.bam" "inputs/chr21.small.bam.bai" "inputs/GRCh38.fa.fai" "ground_truth/svs.chr21.small.sorted.vcf.gz" "ground_truth/svs.chr21.small.sorted.vcf.gz.tbi" "models/cue.pt")

for f in ${FILES[@]}; do
    if [ ! -f ./data/demo/${f} ]; then
        echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: downloading a file now... | '${CUE_DIR}/${f}'"
        PREFIX=$(echo ${f} | cut -d "/" -f 1)
        echo "./data/demo/${PREFIX}"
        wget --directory-prefix="./data/demo/${PREFIX}" ${AWS_FILE_NAME} 
        echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: file found | './data/demo/${f}'"

This got me past the Unpickeling.Error but then I ran into another error below:

Traceback (most recent call last):
  File "../cue/engine/call.py", line 107, in <module>
    joblib.delayed(call)(config.devices[i], chr_name_chunks[i], scan_id) for i in range(n_procs))
  File "/storage/hpc/group/UMAG_test/WORKING/jakth2/SVS_230718/miniconda_envs/dev/lib/python3.7/site-packages/joblib/parallel.py", line 1855, in __call__
    return output if self.return_generator else list(output)
  File "/storage/hpc/group/UMAG_test/WORKING/jakth2/SVS_230718/miniconda_envs/dev/lib/python3.7/site-packages/joblib/parallel.py", line 1784, in _get_sequential_output
    res = func(*args, **kwargs)
  File "../cue/engine/call.py", line 90, in call  
    collect_data_metrics=True, given_ground_truth=given_ground_truth)
  File "/storage/hpc/group/UMAG_test/WORKING/jakth2/cue/engine/core.py", line 98, in evaluate
  File "/storage/hpc/group/UMAG_test/WORKING/jakth2/cue/img/plotting.py", line 202, in plot_images
    image = annotate(image, targets[indices[i]], classes, display_boxes=True, color=(0, 76 / 255, 153 / 255))
  File "/storage/hpc/group/UMAG_test/WORKING/jakth2/cue/img/plotting.py", line 161, in annotate
    cv2.circle(image, (p[0], p[1]), int(image_dim/100), color=color, thickness=-thickness)
cv2.error: OpenCV(4.6.0) :-1: error: (-5:Bad argument) in function 'circle'
> Overload resolution failed:
>  - Can't parse 'center'. Sequence item with index 0 has a wrong type
>  - Can't parse 'center'. Sequence item with index 0 has a wrong type

To fix this I changed both instances of v2.circle(image, (p[0], p[1]) in plotting.py to cv2.circle(image, (int(p[0]), int(p[1])).

Finally, I used the bug fix in Issue #21 to get the tutorial to run.