HCGB-IGTP / XICRA

Small RNAseq pipeline for paired-end reads
MIT License
7 stars 3 forks source link

NO samples were retrieved #42

Open vetmohit89 opened 1 year ago

vetmohit89 commented 1 year ago

(XICRA) [mbansal@login004 teton1]$ XICRA prep -i ./teton1/ -o ./XICRA_Prep/ --debug

######################################################################

XICRA pipeline

Jose F. Sanchez & Lauro Sumoy

Copyright (C) 2019-2021 Lauro Sumoy Lab, IGTP, Spain

######################################################################

|==================================================| | Preparing samples | |==================================================|

--------- Starting Process --------- 01/25/2023, 18:47:21


DEBUG: sampleParser.get_files files list to check DO NOT PRINT THIS LIST: It could be very large... set()

DEBUG: select_samples non_duplicate_names: [] DEBUG: select_samples non_duplicate_names: set() samples_prefix {'.*'} non_duplicate_samples [] tmp dataframe Empty DataFrame Columns: [sample, file] Index: [] Empty DataFrame Columns: [sample, file] Index: [] ** DEBUG: select_samples name_frame_samples: Empty DataFrame Columns: [sample, dirname, name, new_name, name_len, lane, read_pair, lane_file, ext, gz, tag, file] Index: [] number_files: 0 total_samples: set()

**ERROR: No samples were retrieved. Check the input provided

I have used the same extension for paired end file but it is keep giving me the same error.. Please help

JFsanchezherrero commented 1 year ago

Hi there, Thank you very much for pointing it out.

Let me ask you if you have tried the subset example that I provide, either in the git repo or downloaded using XICRA test

Thanks in advance.

vetmohit89 commented 1 year ago

Yes. It is giving the error

I used following command XICRA prep -i ./subset_PE/ -o XICRA_analysis_PE

and it is giving below error ERROR [ System: ln -s /data/user/mbansal/AG_DKC1_rawdata/smallrnaseq/XICRA/subset2test/subset_PE/rep_2_R2.fq.gz /data/user/mbansal/AG_DKC1_rawdata/smallrnaseq/XICRA/subset2test/XICRA_analysis_PE/data/rep_2/raw/rep_2_R2.fq.gz ] ln: failed to create symbolic link \u2018/data/user/mbansal/AG_DKC1_rawdata/smallrnaseq/XICRA/subset2test/XICRA_analysis_PE/data/rep_2/raw/rep_2_R2.fq.gz\u2019: File exists ERROR b'' ERROR [ System: ln -s /data/user/mbansal/AG_DKC1_rawdata/smallrnaseq/XICRA/subset2test/subset_PE/rep_3_R2.fq.gz /data/user/mbansal/AG_DKC1_rawdata/smallrnaseq/XICRA/subset2test/XICRA_analysis_PE/data/rep_3/raw/rep_3_R2.fq.gz ] ln: failed to create symbolic link \u2018/data/user/mbansal/AG_DKC1_rawdata/smallrnaseq/XICRA/subset2test/XICRA_analysis_PE/data/rep_3/raw/rep_3_R2.fq.gz\u2019: File exists ERROR b'' ERROR

JFsanchezherrero commented 1 year ago

Hi there,

Basically, this is printed it out as an error but it is just a warning, it says symbolic link can not be created because it already exists. Can you follow with the pipeline?

Type:

XICRA QC -i XICRA_analysis_PE
XICRA join -i XICRA_analysis_PE --noTrim 
XICRA miRNA -i XICRA_analysis_PE --software miraligner

Thanks in advance Regards

kubu4 commented 10 months ago

I get the same error when running sh test_subset.sh.

Test data was retrieved via XICRA test.

There's a Traceback message thrown almost immediately. Then, a KeyError: 'sample_8' shortly thereafter.

Here's a truncated version of the output (but all modules fail with "No samples were retried..." message):

$ sh test_subset.sh
Thu Feb  1 12:15:50 PST 2024 ... Starting ...

# ------------------------------ #
XICRA prep -i ./subset_SE/ -o XICRA_analysis --single_end
...

Traceback (most recent call last):
  File "/home/sam/programs/mambaforge/envs/XICRA/bin/XICRA", line 396, in <module>

######################################################################
#                           XICRA pipeline                           #
#                   Jose F. Sanchez & Lauro Sumoy                    #
#        Copyright (C) 2019-2022 Lauro Sumoy Lab, IGTP, Spain        #
######################################################################

|==================================================|
|                Preparing samples                 |
|==================================================|

--------- Starting Process ---------
    02/01/2024, 12:15:51

+ Create output folder(s):
Successfully created the directory /home/shared/8TB_HDD_01/sam/analyses/20240201-xicra-test/XICRA_analysis 
+ Generate a directory containing information within the project folder provided
Successfully created the directory /home/shared/8TB_HDD_01/sam/analyses/20240201-xicra-test/XICRA_analysis/info 

--------------------------------------------------
+ Getting files from input folder... 
+ Mode: fastq.
+ Extension: 
[ fastq, fq, fastq.gz, fq.gz ]

+ Input folder exists
    10 files selected...
    10 samples selected...
    Single end mode selected...
-------------------------
(Time spent: 0 h 0 min 0 s)
-------------------------
Successfully created the directory /home/shared/8TB_HDD_01/sam/analyses/20240201-xicra-test/XICRA_analysis/data 
Successfully created the directory /home/shared/8TB_HDD_01/sam/analyses/20240201-xicra-test/XICRA_analysis/data/s 
Successfully created the directory /home/shared/8TB_HDD_01/sam/analyses/20240201-xicra-test/XICRA_analysis/data/s/raw 
+ Sample files will be linked...
    args.func(args)
  File "/home/sam/programs/mambaforge/envs/XICRA/lib/python3.7/site-packages/XICRA/modules/prep.py", line 225, in run_prep
    os.path.join(outdir_dict[row['new_name']], row['new_file']))
KeyError: 'sample_8'

# ------------------------------ #
XICRA QC -i XICRA_analysis --single_end --threads 4
...

######################################################################
#                           XICRA pipeline                           #
#                   Jose F. Sanchez & Lauro Sumoy                    #
#        Copyright (C) 2019-2022 Lauro Sumoy Lab, IGTP, Spain        #
######################################################################

|==================================================|
|                  Quality check                   |
|==================================================|

--------- Starting Process ---------
    02/01/2024, 12:15:52

|==================================================|
|         FASTQC Quality check for samples         |
|==================================================|

+ Getting files from input folder... 
+ Mode: fastq.
+ Extension: 
[ fastq, fq, fastq.gz, fq.gz ]

+ Input folder exists

**ERROR: No samples were retrieved. Check the input provided
JFsanchezherrero commented 10 months ago

Hi there, I guess I was doing some changes in a supplementary package where many functions are stored and some modifications might not have been thoroughly tested.

I will have a look, update and fix the bugs and let you know shortly.

Best regards

kubu4 commented 10 months ago

Thanks so much for the quick response and update. Much appreciated! Looking forward to giving this tool a try!

JFsanchezherrero commented 9 months ago

Hi there, I have updated and I think it should be working now, please give a try and let me know it it works.

I encourage to create a new and fresh environemnt and follow the steps in https://github.com/HCGB-IGTP/XICRA

## Install

# get environment yml configuration
wget https://raw.githubusercontent.com/HCGB-IGTP/XICRA/master/XICRA_pip/devel/conda/environment.yml

conda env create -f environment.yml

# activate
conda activate XICRA_env

# install latest python code
pip install XICRA

# install missing software
wget https://raw.githubusercontent.com/HCGB-IGTP/XICRA/master/XICRA_pip/XICRA/config/software/installer.sh
sh installer.sh

Get test datasets:

XICRA test

This command downloads three diferent datasets into your directory: Single end, paired end and tRNA enriched dataset. Also, a script named

You can run the sh file as sh test_subset.sh that produces the whole analysis for all datasets or do it step by step either checking the contents of the file or following the XICRA workflow:

# Single end
XICRA prep -i ./subset_SE/ -o XICRA_analysis --single_end 
XICRA QC -i XICRA_analysis --single_end --threads 4 
XICRA trim -i XICRA_analysis --single_end --threads 4 --adapters_a TGGAATTCTCGGGTGCCAAGG
XICRA miRNA -i XICRA_analysis --single_end --threads 4 --software miraligner 
# Paired end
XICRA prep -i ./subset_PE/ -o XICRA_analysis_PE
XICRA QC -i XICRA_analysis_PE --threads 4
XICRA join -i XICRA_analysis_PE --threads 4 --noTrim
XICRA miRNA -i XICRA_analysis_PE --threads 4 --software miraligner
## tRNA single end
XICRA prep -i ./subset_tRNA/ -o XICRA_analysis_tRNA --single_end
XICRA QC -i XICRA_analysis_tRNA --single_end --threads 4 
XICRA tRNA -i XICRA_analysis_tRNA --noTrim --single_end --threads 4 --software mintmap 

Unfortunately, due to the new updates on python and some limitations, optimir has been discarded of the miRNA software possibilities and MINTmap, employed in the tRNA might need further debugging.

kubu4 commented 8 months ago

Thanks for the update. Seems like this specific error has been resolved!