dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
72 stars 40 forks source link

No loci pass filtered_by_min_sample in step 7 #367

Closed DennisLarsson closed 4 years ago

DennisLarsson commented 4 years ago

Hello! I have previously worked with stacks and wanted to compare the output of my samples with the output of ipyrad. However I very quickly ran into trouble. I downloaded and installed ipyrad using the recommended miniconda approach. I tried running ipyrad using the command: ipyrad -n test, and it outputs the params file as expected. I setup the param file to take my gzipped demultiplexed fastq files (demultiplexed using illumina2bam and stacks process_radtags) and a populations file. However after I have set up the param file I run into an error when running the command: ipyrad -p params-test.txt -s 1

Traceback (most recent call last): File "/home/biogeoanalysis/miniconda3/bin/ipyrad", line 10, in sys.exit(main()) File "/home/biogeoanalysis/miniconda3/lib/python3.7/site-packages/ipyrad/main.py", line 598, in main CLI() File "/home/biogeoanalysis/miniconda3/lib/python3.7/site-packages/ipyrad/main.py", line 69, in init self.get_assembly() File "/home/biogeoanalysis/miniconda3/lib/python3.7/site-packages/ipyrad/main.py", line 369, in get_assembly data.set_params(key, param) File "/home/biogeoanalysis/miniconda3/lib/python3.7/site-packages/ipyrad/core/assembly.py", line 486, in set_params setattr(self.params, param, newvalue) File "/home/biogeoanalysis/miniconda3/lib/python3.7/site-packages/ipyrad/core/params.py", line 714, in pop_assign_file self._link_populations() AttributeError: 'Params' object has no attribute '_link_populations'

I suspected that the formatting in the populations file was wrong, but upon double checking it is correct. I then looked into the params file, but everything looks fine. Could someone help me figure out the problem?

I have attached the params and populations file: params-test.txt popmap_pop.txt

DennisLarsson commented 4 years ago

I have noticed additional errors over the last few days. First, when I run the assembly process without a populations file, it runs fine and does not complain. However I end up with 0 loci. In the stats file for step 7 I can see that all the loci are filtered out during the last filtering step, number of missing individuals per loci. It filters out all loci, no matter what nr I put in the params file. Somehow they seem to not be recognized as shared loci.

Second, even when I assemble from scratch (at least within one library, I am still working on multi library assembly), trying to include a populations file leads to same above error. Since it is a python error and thus unexpected by the developer, I suspect that something might have been broken in a recent patch. If I have a wrongly formatted populations file, then shouldn't the script tell me so? Anyway I will keep working, I read that Isaac is doing his defense soon, so I won't expect much help. I am a PhD student myself, so I can sympathize...

DennisLarsson commented 4 years ago

I finished an assembly of multiple libraries from scratch over the weekend. It has the same problem as trying to assemble form prefiltered fastq files. It says all loci where removed in the filtered_by_min_sample step.

isaacovercast commented 4 years ago

Whoops! Yeah there was a bug with allowing the pops file which we somehow didn't catch. This is fixed in 0.9.17 (6937de9), which should be up on bioconda within an hour or so.

isaacovercast commented 4 years ago

Also, it was not the format of the popmap file that was causing this error, even though your popmap file is malformed. You need to include a line at the end specifying the min numbers of samples to retain per population. Look up the formatting on the ipyrad docs.

isaacovercast commented 4 years ago

"I finished an assembly of multiple libraries from scratch over the weekend. It has the same problem as trying to assemble form prefiltered fastq files. It says all loci where removed in the filtered_by_min_sample step."

This is a totally unrelated problem, so it's typically good practice to have one issue per problem. It's also good practice to include exact information, for example what does the first section of filtering results from this run look like? Can I see it?

isaacovercast commented 4 years ago

Updating issue name to reflect the new problem.

DennisLarsson commented 4 years ago

"I finished an assembly of multiple libraries from scratch over the weekend. It has the same problem as trying to assemble form prefiltered fastq files. It says all loci where removed in the filtered_by_min_sample step."

This is a totally unrelated problem, so it's typically good practice to have one issue per problem. It's also good practice to include exact information, for example what does the first section of filtering results from this run look like? Can I see it?

I am sorry for making such a mess of a topic, I was running into more problems as I wrote and added them in, without thinking of consistency.

I have attached the stats file for the output of the last step: phyteuma_stats.txt

However, I just reran it using the latest update (9.16) and now it works fine. I thought that my installation might be corrupt, so I reinstalled conda and ipyrad, and maybe it was that simple. Or it was fixed in the update. I will keep you posted if I run into it again, but it is working for now! Sorry to have bother you if it was just a bad install.

isaacovercast commented 4 years ago

Wait, so what does "now it works fine" refer to exactly? You are getting loci that pass the min_sample filter? If so, that's good news, but also weird.

DennisLarsson commented 4 years ago

Wait, so what does "now it works fine" refer to exactly? You are getting loci that pass the min_sample filter? If so, that's good news, but also weird.

Yes exactly, I am still trying to see if I can replicate the problem. But now loci do pass the min_sample filter. All I did was reinstall conda and ipyrad. When I reinstalled, it of course reinstalled the the latest version 0.9.16 instead of 0.9.15, which I had before. If there was nothing related changed in the update then it was probably a corrupt install on my part. I have had that problem before and I should have tried that before I posted here... Again, I am trying to see if this was just a fluke (maybe I did something different?) and the problem remains. But most likely it was a corrupt install. Thanks for your time nonetheless! phyteuma_import_stats.txt

DennisLarsson commented 4 years ago

Ok I figured out what I was doing wrong earlier. I removed the comma after the restriction overhang tag... From this: TGCAG, ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2) To this: TGCAG ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2) I obviously didn't follow the example in the comment, and it caused ipyrad to reject all loci at the end of the pipeline. The params file that I attached in the first post does have the comma, but that is the params file with which I discovered the populations bug. Later, I must have removed the comma and discovered that the 'min_sample filter' rejects all loci if you remove the comma. Again thanks for your patience, and I am sorry for being inattentive...

isaacovercast commented 4 years ago

actually, I should be thanking YOU because you actually have discovered a really aweful, evil bug! It should definitely not have been doing that! I figured out what's going on and fixed it for the next version.

jiangqiuqiuu commented 3 years ago

Hi, Issac, I am coming across the same problem with the last one with the updated conda and python, even though I tried to remove comma after the estriction overhang tag.

It keeps to notice me that: Step 7: Filtering and formatting output files [####################] 100% 0:00:06 | applying filters

Encountered an Error. Message: No loci passed filters. Parallel connection closed.

Thank you so much in advance.

isaacovercast commented 3 years ago

What version of ipyrad are you running? What is your min_samples_locus parameter value? Probably better to jump on the gitter channel https://gitter.im/dereneaton/ipyrad to figure out where the problem is happening.