VUmcCGP / wisecondor

WISECONDOR (WIthin-SamplE COpy Number aberration DetectOR): Detect fetal trisomies and smaller CNV's in a maternal plasma sample using whole-genome data.
Other
44 stars 65 forks source link

Wisecondor DEFRAG algorithm #51

Open smruti241 opened 3 years ago

smruti241 commented 3 years ago

Hii, I have gone thorugh this WISECONDOR and DEFRAG script present in legacy branch. I found WisecondorX quite efficient than wisecondor, but DEFRAG script is not there in wisecondorX so I had to use this. I ran the script and got this error: Testing classifier on trainingset. Number of mislabeled points : 3 Sample DEFRAG subset ChrY DEFRAG whole ChrY Determined Gender Total number of reads Cluster % on Y Traceback (most recent call last): File "defrag.py", line 265, in prediction = gnb.predict(getYPerc(testSamplesPickle(testSample))) TypeError: 'dict' object is not callable

I have prepared .gcc and .pickle for reference sets and randomly put in boydir and girldir (because I dont know the gender, its an unknown sample) and I took test files and their respective .gcc and .pickle files. But it is giving me this error. Can you please tell me whats going wrong? How can I update this script for wisecondorX, do you have any idea?

smruti241 commented 3 years ago

Anyone please tell me whether you have used DEFRAG script, how did you use it? it is showing me the error.

rstraver commented 3 years ago

Paging @dvanbeek ;)

dvanbeek commented 3 years ago

Hi @smruti241, I’m currently out of office with little internet access (and no machines to test on). I’ll get back to you in ~2,5 weeks if that’s okay. Just to double check: you did run the code with a 2.7 version of Python, right? Thanks.

rstraver commented 3 years ago

Checking your error myself:

File "defrag.py", line 265, in
prediction = gnb.predict(getYPerc(testSamplesPickle(testSample)))

I don't think that is actually in the defrag.py script that is on github (not the one I find when I check the legacy branch anyway). Did you edit this script yourself?

The error is about the (testSample) part, should be [testSample] instead.

smruti241 commented 3 years ago

prediction = gnb.predict(getYPerc(testSamplesPickle[testSample])) I tried this also but i got this error: Traceback (most recent call last): File "defrag.py", line 265, in prediction = gnb.predict(getYPerc(testSamplesPickle[testSample])) File "/home/mdrcubuntu/anaconda3/envs/wisecondor/lib/python2.7/site-packages/sklearn/neighbors/classification.py", line 147, in predict X = check_array(X, accept_sparse='csr') File "/home/mdrcubuntu/anaconda3/envs/wisecondor/lib/python2.7/site-packages/sklearn/utils/validation.py", line 545, in check_array "if it contains a single sample.".format(array)) ValueError: Expected 2D array, got scalar array instead: array=0.000588992650289. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

smruti241 commented 3 years ago

Then I changed all the parentheses one by one to understand the actual error but everytime I changed parantheses I got this error: Traceback (most recent call last): File "defrag.py", line 265, in prediction = gnb.predict[(getYPerc[testSamplesPickle[testSample]])] TypeError: 'function' object has no attribute 'getitem'

and the above error too

smruti241 commented 3 years ago

Actually both wisecondor and wisecondorX are working fine, but I got stuck in the defrag command only, before that i created .pickle and .gcc files and put it in boydir, girldir and testdir. I created the conda environment of python 2.7 and ran the whole pipeline

rstraver commented 3 years ago

You are confusing functions calls (()) and dictionary lookups ([]) here. I suggest you read a bit into python scripting before editing scripts like that.

This is correct: prediction = gnb.predict(getYPerc(testSamplesPickle[testSample]))

All your other attempts make no sense to python.

For your other error: I don't know what you are doing, you edited the code and it now it doesn't work, I really can't help you with that. I don't think it's required for defrag.py to run either so if you want to change something/develop things I really suggest you learn a bit more python first.

smruti241 commented 3 years ago

prediction = gnb.predict(getYPerc(testSamplesPickle[testSample])) I tried this also but i got this error: Traceback (most recent call last): File "defrag.py", line 265, in prediction = gnb.predict(getYPerc(testSamplesPickle[testSample])) File "/home/mdrcubuntu/anaconda3/envs/wisecondor/lib/python2.7/site-packages/sklearn/neighbors/classification.py", line 147, in predict X = check_array(X, accept_sparse='csr') File "/home/mdrcubuntu/anaconda3/envs/wisecondor/lib/python2.7/site-packages/sklearn/utils/validation.py", line 545, in check_array "if it contains a single sample.".format(array)) ValueError: Expected 2D array, got scalar array instead: array=0.000588992650289. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I didnt change anything but I am getting this error. Can you please tell me why?

smruti241 commented 3 years ago

I am not changing script without any cause or reason. I know python, thats why I changed it according to my environment but it didnt work out before changing the script

smruti241 commented 3 years ago

You can see I havent changed the line but got this error of expected2D array. Can you tell me whats this about?

rstraver commented 3 years ago

It's complaining about line 265, and that line is not there in the version I'm looking at, here: https://github.com/VUmcCGP/wisecondor/blob/13ed9af395da12a6b25eb2f9e6805973cd4cdbd8/defrag.py#L265

So I don't know what it is about. Looks like parts of the code before it were cut out and therefor it now doesn't have the right info for that call.

smruti241 commented 3 years ago

It's line 265 in my code because I haven't copied the first few lines of the code....wisecondor defrag script description. It's about line 290 in the code. Please have a look on that line and please let me know. Only this line is troubling me

rstraver commented 3 years ago

Please use a git checkout or download the whole package as a zip file instead of copy pasting pieces of code in the future, this way it's near impossible to tell what errors are really about.

Anyway I'm guessing you only provided a single sample here? Perhaps you can try running with multiple?

smruti241 commented 3 years ago

Yeah I have given single test sample. Shall I use multiple test samples ?

smruti241 commented 3 years ago

This is my command: python defrag.py --scalingFactor 0.688334125062 --percYonMales 0.00146939199267 boydir/ girldir/ testdir/ outputfig I am getting this with error: ./DEFRAG_out

Script information:

Settings used:

boydir boydir/ girldir girldir/ maledir None outputfig outputfig percYonMales 0.00146939199267 scalingFactor 0.688334125062 testdir testdir/

Processing:

Loading reference samples Loading boy gcc: boydir/IonXpress_006_NIPT.gcc Loading boy pickle: boydir/IonXpress_006_NIPT.pickle Loading boy gcc: boydir/IonXpress_007_NIPT.gcc Loading boy pickle: boydir/IonXpress_007_NIPT.pickle Loading boy gcc: boydir/IonXpress_008_NIPT.gcc Loading boy pickle: boydir/IonXpress_008_NIPT.pickle Loading girl gcc: girldir/IonXpress_009_NIPT.gcc Loading girl pickle: girldir/IonXpress_009_NIPT.pickle Loading girl gcc: girldir/IonXpress_010_NIPT.gcc Loading girl pickle: girldir/IonXpress_010_NIPT.pickle Loading girl gcc: girldir/IonXpress_011_NIPT.gcc Loading girl pickle: girldir/IonXpress_011_NIPT.pickle Bins that are kept for subset Y analysis: [3] Loading test samples Loading test gcc: testdir/IonXpress_006_NIPT.gcc Loading test pickle: testdir/IonXpress_006_NIPT.pickle Loading test gcc: testdir/IonXpress_010_NIPT.gcc Loading test pickle: testdir/IonXpress_010_NIPT.pickle Loading test gcc: testdir/IonXpress_011_NIPT.gcc Loading test pickle: testdir/IonXpress_011_NIPT.pickle percYMales: 0.00146939199267 corrMalesMedian: 0.688334125062 Testing classifier on trainingset. Number of mislabeled points : 3 Sample DEFRAG subset ChrY DEFRAG whole ChrY Determined Gender Total number of reads Cluster % on Y Traceback (most recent call last): File "defrag_new.py", line 290, in prediction = gnb.predict(getYPerc(testSamplesPickle[testSample])) File "/home/mdrcubuntu/anaconda3/envs/wisecondor/lib/python2.7/site-packages/sklearn/neighbors/classification.py", line 147, in predict X = check_array(X, accept_sparse='csr') File "/home/mdrcubuntu/anaconda3/envs/wisecondor/lib/python2.7/site-packages/sklearn/utils/validation.py", line 545, in check_array "if it contains a single sample.".format(array)) ValueError: Expected 2D array, got scalar array instead: array=0.000588992650289. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Now what should I do? I havent changed anything in your script and do whatever the script has. Please help me

smruti241 commented 3 years ago

@dvanbeek 2-5 weeks would be very late. Please help me regarding this in this week or by next week. I have run the code in Python 2.7 only, but it is giving me the above error. I did all the things, increased my test samples and training samples, changed a little bit in script, but all went in vain. It would be very kind of you if you help me ASAP.

smruti241 commented 3 years ago

Someone please help me, I want to use this DEFRAG script only. I dont have any option other than this

smruti241 commented 3 years ago

@rstraver @dvanbeek Can you please help me in DEFRAG script? I am unable to run this script. I want to use this. It has been a quite long time that you didnt respond to my query.

smruti241 commented 3 years ago

@rstraver @dvanbeek I am asking you both for the past one month about this. If you dont want to help me, please tell me. I wont mind at all. It seems like I am begging in front of you. I am using your tools, I am facing problems, thats why I am asking for help. Nearly everyone in your paper's authors didnt know about wisecondor or defrag or how to use it. Then how did you guys worked on it before. Please if possible help me for defrag only. @rstraver I dont want any kind of help for sanefalcon because i dont think so its worth for fetal fraction prediction.

rstraver commented 3 years ago

Hi again, sorry for being unresponsive for this long. I wanted to wait for Daphne to come back from holiday and hoped she had some useful pointers, but that seems unlikely to help much. Altogether we get the impression you are trying a lot of things because WisecondorX didn't work, but based on what I've seen, that's more likely to do with the data you put in than that implementation being off.

Additionally, pushing for answers from someone on holiday (over other channels) and copying code while omitting the comments that include author and copyright info is considered pretty rude at best, if not suggestive of plagiarism. Seriously, don't do that.

This code is still available but not actively maintained, it's fairly old. I think it's best if you reconsider using WisecondorX instead. If you really need to make defrag work, ask someone around you with some deeper python knowledge to debug it on your machine. It seems there's something off on your end (machine/usage) that we cannot reproduce/resolve within reasonable efforts.

smruti241 commented 3 years ago

@rstraver I didnt know about daphne that she is on holiday. WisecondorX worked best for me. I thought it would be great if I can make defrag work with wisecondorX, but this script is not working with wisecondor too. Secondly, I didnt omit anything in your code, just downloaded it and ran but it gave the same error as I mentioned above. if someone around me could have helped me, then why would have i ask you. Anyways, if it is not working with wisecondorX, its better for me to call off this tool. Thank you for your time.

Karma0alpha commented 1 year ago

@smruti241 @rstraver I've encountered the same error when running this script as: File "defrag_new.py", line 290, in prediction = gnb.predict(getYPerc(testSamplesPickle[testSample])) File "/home/mdrcubuntu/anaconda3/envs/wisecondor/lib/python2.7/site-packages/sklearn/neighbors/classification.py", line 147, in predict X = check_array(X, accept_sparse='csr') File "/home/mdrcubuntu/anaconda3/envs/wisecondor/lib/python2.7/site-packages/sklearn/utils/validation.py", line 545, in check_array "if it contains a single sample.".format(array)) ValueError: Expected 2D array, got scalar array instead: array=0.000588992650289. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample. Then I ran the script step by step in jupyter notebook and tried to figure out the problem, as gnb.prediction can only predict a 2D np.array like [[0.01]], at line 290, we provided it with getYPerc(testSamplesPickle[testSample]) was just a float, so it raised an error. So just modify the line 290 as gnb.predict(np.array(getYPerc(testSamplesPickle[testSample])).reshape(-1,1)) would fix this issue. I focused on the defreg.py script because the Gassian model in WisecondorX cannot predict the sex of NIPT samples so accurately and I plan to upgrade it by testing its performance by running only on the male-specific region of the Y chromosome. Good luck!