NoamShental / 5R

SMURF reconstruction of sequencing results based on the 5R protocol
5 stars 2 forks source link

Having trouble getting 5R to work on NCBI downloaded Nejman et al. data #2

Open gregpoore opened 3 years ago

gregpoore commented 3 years ago

Hi Noam,

Our lab downloaded the raw Nejman et al. files hosted on NCBI (PRJNA624822) and have been trying to re-run them through 5R to get taxonomy assignments. Although I've gotten the github example fastq files running on our server with expected outputs, the NCBI downloaded files consistently fail with 0 reads detected. To troubleshoot, I unzipped the fastq files and tried renaming them in a similar style to the github example files, but neither of these work. The four fastq files I was testing on can be found at the Google Drive link below. I'm also attaching the output error below. Do you know what might be happening?

Google Drive link to example fastq files: https://drive.google.com/drive/folders/1l9N8Gsok2TNgoZI3mB7f4UBWv17DE15-?usp=sharing

>> main_5R('../example_fastq_3','../GG_5R','../example_results/5R_SMURF_example_3.txt',126)
Input params are:
Working on files in directory: ../example_fastq_3
Reconstruction using kmers of length: 126
get_configs.m contains the primers sequences as appear in the Nejman et al. manuscript. This file will be decrypted uppon acceptance.
WORKING ON SAMPLE 4: SRS6470169
Number of reads: 0
Percent of long enough reads: NaN
Percent of good reads: NaN
Loading bacterial DB for region 1 out of 5 from original region 1
Loading bacterial DB for region 2 out of 5 from original region 2
Loading bacterial DB for region 3 out of 5 from original region 3
Loading bacterial DB for region 4 out of 5 from original region 4
Loading bacterial DB for region 5 out of 5 from original region 5
Region 1 out of 5
Keep high freq: NaN% of reads
Keep high freq: NaN% of counts
Building matrix M
Building matrix A
--------------------------------------------
Region 2 out of 5
Keep high freq: NaN% of reads
Keep high freq: NaN% of counts
Building matrix M
Building matrix A
--------------------------------------------
Region 3 out of 5
Keep high freq: NaN% of reads
Keep high freq: NaN% of counts
Building matrix M
Building matrix A
--------------------------------------------
Region 4 out of 5
Keep high freq: NaN% of reads
Keep high freq: NaN% of counts
Building matrix M
Building matrix A
--------------------------------------------
Region 5 out of 5
Keep high freq: NaN% of reads
Keep high freq: NaN% of counts
Building matrix M
Building matrix A
--------------------------------------------
Warning: Make sure PE is supported properly
> In solve_iterative_noisy (line 4)
In reconstruction_func (line 40)
In main_multiple_regions (line 56)
In main_5R (line 57)
Region 1 out of 5
Keeping reads matched to DB: NaN% of reads
Keeping reads matched to DB: NaN% of counts
--------------------------------------------
Region 2 out of 5
Keeping reads matched to DB: NaN% of reads
Keeping reads matched to DB: NaN% of counts
--------------------------------------------
Region 3 out of 5
Keeping reads matched to DB: NaN% of reads
Keeping reads matched to DB: NaN% of counts
--------------------------------------------
Region 4 out of 5
Keeping reads matched to DB: NaN% of reads
Keeping reads matched to DB: NaN% of counts
--------------------------------------------
Region 5 out of 5
Keeping reads matched to DB: NaN% of reads
Keeping reads matched to DB: NaN% of counts
--------------------------------------------
Filter out columns (bacteria)
Warning: TAKE PROPER CARE OF NOT AMPLIFIED REGIONS
> In solve_iterative_noisy (line 90)
In reconstruction_func (line 40)
In main_multiple_regions (line 56)
In main_5R (line 57)
Normalize frequency counts
Build matrix A_L2
Making columns of A unique...
Removing included bacterias...
Removed 0 out of 0
Warning: Found 0 bacterias with non even number of reads mapped
> In solve_iterative_noisy (line 178)
In reconstruction_func (line 40)
In main_multiple_regions (line 56)
In main_5R (line 57)
Starting iterations...
Total iterations time: 9.5e-05
Building the Scott files for level: species
Loaded
Index in position 2 exceeds array bounds.

Error in build_scott_list_new (line 16)
full_names_table = cell2table(one_compact_cell(:,3:tl+2));

Error in scott_format_newer_func (line 138)
        build_scott_list_new(tl,all_answer_cell,all_gr_headers,all_passed_filt);

Error in main_5R (line 62)
scott_format_newer_func(batch_samples_list,results_filename)
zd200572 commented 3 years ago

i think this is because the filename is not recognized by the scripts? like RDB123_ATGAGTGC_L006_R2_001.fastq

ianutin commented 7 months ago

i also meet this error when i used my files to run it, have you known how to solve it?