dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
70 stars 39 forks source link

Encountered an Error. Message: No loci passed filters. #497

Closed jiangqiuqiuu closed 1 year ago

jiangqiuqiuu commented 1 year ago

Hi Issac,

I encountered an issue while running ipyrad on my RAD-seq, seems steps 1-6 went well but no loci remained after filtering. Then I was trying to beach it with a new set of parameters and ran again. But failed with some issues (see below). I am keeping having problems with branching, I guess there are some misunderstandings. Here is some of my log file:

ipyrad [v.0.9.85]

<Interactive assembly and analysis of RAD-seq data


Parallel connection | r4u08n1.puma.hpc.arizona.edu: 96 cores

Step 1: Loading sorted fastq data to Samples [####################] 100% 0:01:18 | loading reads
140 fastq files loaded to 140 Samples.

Step 2: Filtering and trimming reads [####################] 100% 0:09:26 | processing reads

Step 3: Clustering/Mapping reads within samples [####################] 100% 0:01:04 | dereplicating
[####################] 100% 0:43:54 | clustering/mapping
[####################] 100% 0:00:11 | building clusters
[####################] 100% 0:00:01 | chunking clusters
[####################] 100% 1:55:42 | aligning clusters
[####################] 100% 0:00:40 | concat clusters
[####################] 100% 0:00:13 | calc cluster stats

Step 4: Joint estimation of error rate and heterozygosity [####################] 100% 0:06:14 | inferring [H, E]

Step 5: Consensus base/allele calling Mean error [0.00374 sd=0.00073] Mean hetero [0.01281 sd=0.00273] [####################] 100% 0:00:09 | calculating depths
[####################] 100% 0:01:01 | chunking clusters
[####################] 100% 0:47:31 | consens calling
[####################] 100% 0:04:10 | indexing alleles

Step 6: Clustering/Mapping across samples [####################] 100% 0:00:17 | concatenating inputs
[####################] 100% 0:04:46 | clustering across
[####################] 100% 0:00:10 | building clusters
[####################] 100% 0:06:24 | aligning clusters

Step 7: Filtering and formatting output files [####################] 100% 0:00:21 | applying filters /home/u12//.conda/envs/ipyrad/lib/python3.10/site-packages/ipyrad/assemble/write_outputs.py:355: FutureWarning: The behavior of series[i:j] with an integer-dtype index is deprecated. In a future version, this will be treated as label-based* indexing, consistent with e.g. series[i] lookups. To retain the old behavior, use series.iloc[i:j]. To get the future behavior, use series.loc[i:j]. {i: np.sum(covs[start:i]) for i in lrange},

Encountered an Error. Message: No loci passed filters. Parallel connection closed.>

WIth this error, I was trying to branch it with some new parameters. However, with the branching command: <ipyrad -p params-subset2.txt -b subset3>

Another error came up, <loading Assembly: sel_subset_2 from saved path: /xdisk/msbarker/qiuyujiang/ipyrad/sel_subset_2.json ipyrad.assemble.utils.IPyradError: The sample names in the assembly disagree with sample names in the pop_assign_file. Sample names in the pop_assign_file must exactly match sample names in the assembly, and you must specify a population for each sample in the assembly.

Names in the pop_assign_file that do not appear in the assembly:
    []

Samples in the assembly that are not specified in the pop_assign_file:
    []

NB: If you recently branched and removed samples you need to create a _new_
pop_assign_file which contains only the samples you retained in the new
branch (see https://github.com/dereneaton/ipyrad/issues/375).>

I tried to read your previous answers to related questions, sorry I am very naive on this. I tried to branch as you suggested I can't even create a branch file, could you please tell me what's wrong here? And How can I solve those? Many thanks!

Qiuyu

isaacovercast commented 1 year ago

The problem is your min_samples_locus setting in step 7 is too high. Can you please show your parameters? You don't need to create a branch to change a parameter value, you can simply change the min_samples_locus setting to something much lower (like 4) and re-run step 7 and it will work fine.

jiangqiuqiuu commented 1 year ago

Hi Issac,

Thank you so much for your response. My min_samples_locus was set as 3. I tried changing it to 1 and rerun step 7 with the command: but another error came up:

<loading Assembly: sel_subset_2 from saved path: "//ipyrad/sel_subset_2.json Traceback (most recent call last): File "/.conda/envs/ipyrad/bin/ipyrad", line 10, in sys.exit(main()) File ""//.conda/envs/ipyrad/lib/python3.10/site-packages/ipyrad/main.py", line 605, in main CLI() File ""//.conda/envs/ipyrad/lib/python3.10/site-packages/ipyrad/main.py", line 69, in init self.get_assembly() File "//.conda/envs/ipyrad/lib/python3.10/site-packages/ipyrad/main.py", line 368, in get_assembly data.set_params(key, param) File ""//.conda/envs/ipyrad/lib/python3.10/site-packages/ipyrad/core/assembly.py", line 493, in set_params setattr(self.params, param, newvalue) File ""//.conda/envs/ipyrad/lib/python3.10/site-packages/ipyrad/core/params.py", line 267, in setattr super().setattr(key, val) File ""//.conda/envs/ipyrad/lib/python3.10/site-packages/ipyrad/core/params.py", line 465, in phred_Qscore_offset value = int(value) ValueError: invalid literal for int() with base 10: ''>

Not sure what is causing the new error, your help will be much appreciated! Thanks.

Cheers Qiuyu

isaacovercast commented 1 year ago

If your minsamples was 3 that's already very low. Did you remove the pop_assign file? Can you paste here the results of ipyrad -p <yourparamsfile.txt> -r? This should show lots of stats about results.

jiangqiuqiuu commented 1 year ago

I did edit the pop_assign file while trying to fix the first error I encountered, later I changed to popfile back to the original file when I realized changing it might be an issue. The error that came up after entering ' ipyrad -p params-subset2.txt -r ': :loading Assembly: sel_subset_2 from saved path: /ipyrad/sel_subset_2.json Traceback (most recent call last): File "/home/u12//.conda/envs/ipyrad/bin/ipyrad", line 10, in <module> sys.exit(main()) File "/.conda/envs/ipyrad/lib/python3.10/site-packages/ipyrad/__main__.py", line 605, in main CLI() File ".conda/envs/ipyrad/lib/python3.10/site-packages/ipyrad/__main__.py", line 69, in __init__ self.get_assembly() File "/.conda/envs/ipyrad/lib/python3.10/site-packages/ipyrad/__main__.py", line 368, in get_assembly data.set_params(key, param) File "/.conda/envs/ipyrad/lib/python3.10/site-packages/ipyrad/core/assembly.py", line 493, in set_params setattr(self.params, param, newvalue) File "/.conda/envs/ipyrad/lib/python3.10/site-packages/ipyrad/core/params.py", line 267, in __setattr__ super().__setattr__(key, val) File "/.conda/envs/ipyrad/lib/python3.10/site-packages/ipyrad/core/params.py", line 465, in phred_Qscore_offset value = int(value) ValueError: invalid literal for int() with base 10: ''

Cheers Qiuyu

isaacovercast commented 1 year ago

Can you post your params file? It looks like you have a bad value for phred_Qscore_offset

isaacovercast commented 1 year ago

It can't be empty. By default it's 33

jiangqiuqiuu commented 1 year ago

Yes, that's right I changed the phred_Qscore_offset to 33,running but ...

' loading Assembly: sel_subset_2 from saved path: /xdisk/msbarker/qiuyujiang/ipyrad/sel_subset_2.json ipyrad.assemble.utils.IPyradError: The sample names in the assembly disagree with sample names in the pop_assign_file. Sample names in the pop_assign_file must exactly match sample names in the assembly, and you must specify a population for each sample in the assembly.

Names in the pop_assign_file that do not appear in the assembly:
    []

Samples in the assembly that are not specified in the pop_assign_file:
    ['***************************']

NB: If you recently branched and removed samples you need to create a _new_
pop_assign_file which contains only the samples you retained in the new
branch (see https://github.com/dereneaton/ipyrad/issues/375).'

There are much more samples in the [sorted_fastq_path] file than [pop_assign_file], I guess that's the issue.

isaacovercast commented 1 year ago

Can you please remove the pop_assign_file and try again? This file doesn't do anything except allow you to set minimum samples per population for retaining data in step 7. It is not necessary as a first step and it seems that you have formatting issues with it. The pop_assign_file can only contain lines with sample ids and population labels. There can be no other lines containing "*****", which looks like what you have. Please remove the pop_assign_file or fix it and try again.

pardis-fld commented 1 year ago

Hello, I have the same problem.when I import the pop_assign_file, I get an error in step 7 and when I remove the pop_assign_file from the params file, I get the output file but in Jupyter notebook I get an error. In your opinion, which part is the problem? my params file: iptest_fastqs/*.fastq ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files denovo ## [5] [assembly_method]: Assembly method (denovo, reference)

[6] [reference_sequence]: Location of reference sequence file

ddrad ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc. TGCA, TATG ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2) 5 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read 33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard) 6 ## [11] [mindepth_statistical]: Min depth for statistical base calling 6 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling 10000 ## [13] [maxdepth]: Max cluster depth within samples 0.85 ## [14] [clust_threshold]: Clustering threshold for de novo assembly 0 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes 2 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter) 35 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim 2 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences 0.05 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus 0.05 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus 4 ## [21] [min_samples_locus]: Min # samples per locus for output 0.2 ## [22] [max_SNPs_locus]: Max # SNPs per locus 8 ## [23] [max_Indels_locus]: Max # of indels per locus 0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus 0, 0, 0, 0 ## [25] [trim_reads]: Trim raw read edges (R1>, <R1, R2>, <R2) (see docs) 0, 0, 0, 0 ## [26] [trim_loci]: Trim locus edges (see docs) (R1>, <R1, R2>, <R2)

The number of loci recovered for each Sample.

ipyrad API location: [assembly].stats_dfs.s7_samples

 sample_coverage

s14 5 s17 13 s18 17 s19 15 s20 11 s21 14 s22 2 s23 10 s24 16 s25 7 s26 12 s27 10 s28 11 s30 10 s31 10 s32 8 s33 8 s34 7 s36 8 s37 17 s38 8 s39 12 s40 24 s41 9 s42 8 s43 5 s44 7 Jpyter error: Samples: 27 Sites before filtering: 212 Filtered (indels): 40 Filtered (bi-allel): 2 Filtered (mincov): 0 Filtered (minmap): 212 Filtered (subsample invariant): 2 Filtered (minor allele frequency): 0 Filtered (combined): 212 IPyradError: No SNPs passed filtering.

isaacovercast commented 1 year ago

@pardis-fld The error says "No SNPs passed filtering." which means you are removing all your data with the pop_assign_file and the min samples per locus per population. When you remove the pop_assign_file it removes the constraint on min_samples per population, which is why it retains some of the data. Overall it looks like you already have a very very small amount of data (212 SNPs), and a typical assembly would have 10 or 100x more data than that, so you might go back and look to see how you could recover more data form the assembly, if that is possible.

isaacovercast commented 1 year ago

This is more a problem with the data and not ipyrad so I am going to close this issue, but please feel free to jump on the gitter channel if you have more questions:

https://app.gitter.im/#/room/#dereneaton_ipyrad:gitter.im