Rosemeis / pcangsd

Framework for analyzing low depth NGS data in heterogeneous populations using PCA.
GNU General Public License v3.0
46 stars 11 forks source link

-filter error #44

Closed alxsimon closed 3 years ago

alxsimon commented 3 years ago

Hi, I have an error when using the -filter option on my data. It reads the ind_subset.txt ok with the right number, but the issue seems to come downstream from that.

As explained here #31 I created this file from a numpy array of 0 and 1 corresponding to the columns to select, and saved with numpy.savetxt (I couldn't make it work with the binary numpy.save method as it is loaded with numpy.genfromtxt in the script).

The options I used

PCAngsd v.1.0
Time: 20/01/2021 16:48:03
Directory: /shared/ifbstor1/projects/mytilus/angsd_calling
Options:
        -beagle results/post_analysis/subset.beagle.gz
        -filter results/post_analysis/ind_subset.txt
        -iter 200
        -e 10
        -snp_weights
        -admix
        -sites_save
        -threads 28
        -out results/post_analysis/pcangsd_indsub

And the error

PCAngsd v.1.0
Using 28 thread(s).

Parsing Beagle file.
Only loading 621/660 individuals.
Traceback (most recent call last):
  File "/opt/pcangsd/pcangsd.py", line 161, in <module>
    L = reader_cy.readBeagleFilter(args.beagle, nFilter, int(np.sum(nFilter)))
  File "reader_cy.pyx", line 58, in reader_cy.readBeagleFilter
    cpdef np.ndarray[DTYPE_t, ndim=2] readBeagleFilter(str beagle, \
  File "reader_cy.pyx", line 98, in reader_cy.readBeagleFilter
    L_np[s] = np.asarray(<float[:n]> L_ptr)
ValueError: could not broadcast input array from shape (660) into shape (1863)

Removing the filter option works fine, maybe I did not understand the format correctly.

Rosemeis commented 3 years ago

Hi,

Yeah I just changed how the filter works, sorry! I'm in the middle of also making a tutorial. :-) How many individuals do you have in your beagle file and how many lines do you have in your filter file?

Best, Jonas

alxsimon commented 3 years ago

I have 220 individuals in beagle, so 660 columns of genotype likelihoods (+3 for markers). In my filter file I have 660 lines with 621 ones (so 207 individuals). I hope that is clear enough. Best, Alexis

Rosemeis commented 3 years ago

Okay, sorry for it being confusing with the previous answer in the other issue, but you only need to specify a 1 or 0 for the 220 individuals and not the 220*3. Hope this helps!

Best, Jonas

alxsimon commented 3 years ago

Thank you very much for the fast replies ! However this does not seem to have solved the error:

PCAngsd v.1.0
Using 28 thread(s).

Parsing Beagle file.
Only loading 207/220 individuals.
Traceback (most recent call last):
  File "/opt/pcangsd/pcangsd.py", line 161, in <module>
    L = reader_cy.readBeagleFilter(args.beagle, nFilter, int(np.sum(nFilter)))
  File "reader_cy.pyx", line 58, in reader_cy.readBeagleFilter
    cpdef np.ndarray[DTYPE_t, ndim=2] readBeagleFilter(str beagle, \
  File "reader_cy.pyx", line 98, in reader_cy.readBeagleFilter
    L_np[s] = np.asarray(<float[:n]> L_ptr)
ValueError: could not broadcast input array from shape (660) into shape (621)
alxsimon commented 3 years ago

Shouldn't this bit L_np[s] = np.asarray(<float[:n]> L_ptr) be L_np[s] = np.asarray(<float[:N]> L_ptr) instead ?

(the variable iN is also defined and unused in this reader_cy.readBeagleFilter function)

Let me know if you prefer I test this myself and make a pull request.

Rosemeis commented 3 years ago

You are absolutely right! :-) Just fixed it. Thanks for checking it for me.

Best, Jonas

alxsimon commented 3 years ago

Happy to help Best, Alexis