abdrakhimov1 / Biosaur

Apache License 2.0
14 stars 6 forks source link

'bool' object has no attribute 'mz_array' processding tdf data #20

Closed mafreitas closed 3 years ago

mafreitas commented 3 years ago

1) thanks for porting this tool to python

2) I am processing a timsTof data file. I converted with msconvert using the suggested settings, but I get the following error (one for each process).

Process Process-1: Traceback (most recent call last): File "/usr/lib64/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/usr/lib64/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib64/python3.7/site-packages/biosaur_src/funcs.py", line 657, in worker_data_to_features end_index File "/usr/local/lib64/python3.7/site-packages/biosaur_src/funcs.py", line 62, in data_to_features peak_ion_mobility_object.mz_array) AttributeError: 'bool' object has no attribute 'mz_array'

mafreitas commented 3 years ago

I just wanted to add that the above error was after installing via pip.

I also installed from source. It goes a little further in the process.

021-05-13 00:35:30 ip-172-31-20-84.us-east-2.compute.internal root[20950] INFO Starting program with following params... 2021-05-13 00:35:30 ip-172-31-20-84.us-east-2.compute.internal root[20950] INFO Starting with args: { "input_mzml_path": [ "20201204_Pool_5x_dilute_iRT_Slot1-20_1_3484.mzML" ], "number_of_processes": 0, "correlation_map": false, "negative_mode": false, "mass_accuracy": 8, "min_charge": 1, "max_charge": 6, "min_length": 3, "min_length_hill": 2, "min_intensity": 50000.0, "hill_valley_factor": 1.3, "debug": false, "output_file": null, "pep_xml_file_path": "0", "faims": false } 2021-05-13 00:35:30 ip-172-31-20-84.us-east-2.compute.internal root[20950] INFO Reading scans... 2021-05-13 00:42:47 ip-172-31-20-84.us-east-2.compute.internal root[20950] INFO Number of MS1 scans: 9732 2021-05-13 00:42:47 ip-172-31-20-84.us-east-2.compute.internal root[20950] INFO Converting your data, using maximum amount of processes... 2021-05-13 00:42:50 ip-172-31-20-84.us-east-2.compute.internal root[21248] INFO Data converted to features with process /2/ ---> 2021-05-13 00:42:50 ip-172-31-20-84.us-east-2.compute.internal root[21251] INFO Data converted to features with process /5/ ---> 2021-05-13 00:42:50 ip-172-31-20-84.us-east-2.compute.internal root[21250] INFO Data converted to features with process /4/ ---> 2021-05-13 00:42:50 ip-172-31-20-84.us-east-2.compute.internal root[21247] INFO Data converted to features with process /1/ ---> 2021-05-13 00:42:50 ip-172-31-20-84.us-east-2.compute.internal root[21252] INFO Data converted to features with process /6/ ---> 2021-05-13 00:42:50 ip-172-31-20-84.us-east-2.compute.internal root[21249] INFO Data converted to features with process /3/ ---> 2021-05-13 00:42:50 ip-172-31-20-84.us-east-2.compute.internal root[20950] INFO All data converted to hills... 2021-05-13 00:42:50 ip-172-31-20-84.us-east-2.compute.internal root[20950] INFO Processing hills... 2021-05-13 00:42:50 ip-172-31-20-84.us-east-2.compute.internal root[20950] INFO Your hills proccesing with maximum amount of processes... 2021-05-13 00:42:50 ip-172-31-20-84.us-east-2.compute.internal root[20950] INFO 0 hills were detected... 2021-05-13 00:42:50 ip-172-31-20-84.us-east-2.compute.internal root[20950] INFO 0 hills were detected... 2021-05-13 00:42:50 ip-172-31-20-84.us-east-2.compute.internal root[20950] INFO Start recalc_fast_array_for_finished_hills... 2021-05-13 00:42:50 ip-172-31-20-84.us-east-2.compute.internal root[20950] INFO Start boosting_secondstep_with_processes... 2021-05-13 00:42:50 ip-172-31-20-84.us-east-2.compute.internal root[21266] INFO All hills were iterated correctly with this process /1/ --> 2021-05-13 00:42:50 ip-172-31-20-84.us-east-2.compute.internal root[21267] INFO All hills were iterated correctly with this process /2/ --> Traceback (most recent call last): File "/usr/local/bin/biosaur", line 11, in load_entry_point('Biosaur==2.0.0', 'console_scripts', 'biosaur')() File "/usr/local/lib/python3.7/site-packages/biosaur_src/biosaur.py", line 136, in run return bio.process_files(args) File "/usr/local/lib/python3.7/site-packages/biosaur_src/bio.py", line 264, in process_files min_length) File "/usr/local/lib/python3.7/site-packages/biosaur_src/funcs.py", line 880, in boosting_secondstep_with_processes ready = sorted(ready, key=lambda x: -len(x[1])) TypeError: 'bool' object is not iterable

The command I used for both was:

biosaur -mini 50000 20201204_Pool_5x_dilute_iRT_Slot1-20_1_3484.mzML

markmipt commented 3 years ago

Dear Michael,

The PyPi version is outdated and we have some issues with uploading fresh releases on PyPi right now. But the latest commit already fixes the issue mentioned in your first message.

As for the second issue, the problem here is that Biosaur does not detect any m/z traces (hills). I believe that minimal intensity threshold is the problem here (-mini 50000). In my experience with TIMS-TOF data, most of the peptide peaks have intensities in the range 100-10000 (and noise peaks in the range 1-10). You should try to reduce this "-mini" threshold down to 10-1000 values.

We will update log messages of the Biosaur for more clear output in the near future.

Regards, Mark

mafreitas commented 3 years ago

That was the issue. But I would like to point out that if the threads run out of memory the program seems to just hang forever. I did not spend a lot of diagnosing. I also build a docker image. If you would like to include in repo, I can do a pull request.