MannLabs / directlfq

Fast and accurate label-free quantification for small and very large numbers of proteomes
https://www.mcponline.org/article/S1535-9476(23)00092-0/fulltext
Apache License 2.0
43 stars 5 forks source link

Error reading generic tsv #40

Open ht-lau opened 3 months ago

ht-lau commented 3 months ago

Hi,

I am using DIA-NN 1.9.1, so I tried to general a generic tsv (attached). With directlfq 0.2.20 GUI with no setting change, I got the errors below. directlfq_in.zip

I also get similar error if I run the diann.tsv in your test_data/unit_tests/input_table_formats folder if I change the setting to DIANN precursor MS1 and MS2.

Thanks HT

below are the error =========================

2024-08-23 15:45:20,642 - directlfq.lfq_manager - INFO - Starting directLFQ analysis. ERROR:bokeh.server.protocol_handler:error handling message message: Message 'PATCH-DOC' content: {'events': [{'kind': 'MessageSent', 'msg_type': 'bokeh_event', 'msg_data': {'type': 'event', 'name': 'button_click', 'values': {'type': 'map', 'entries': [['model', {'id': 'p1149'}]]}}}]} error: TypeError('format not specified in intable_config.yaml!') Traceback (most recent call last): File "bokeh\server\protocol_handler.py", line 97, in handle work = await handler(message, connection) File "bokeh\server\session.py", line 94, in _needs_document_lock_wrapper result = func(self, *args, *kwargs) File "bokeh\server\session.py", line 288, in _handle_patch message.apply_to_document(self.document, self) File "bokeh\protocol\messages\patch_doc.py", line 104, in apply_to_document invoke_with_curdoc(doc, lambda: doc.apply_json_patch(self.payload, setter=setter)) File "bokeh\document\callbacks.py", line 443, in invoke_with_curdoc return f() File "bokeh\protocol\messages\patch_doc.py", line 104, in invoke_with_curdoc(doc, lambda: doc.apply_json_patch(self.payload, setter=setter)) File "bokeh\document\document.py", line 376, in apply_json_patch DocumentPatchedEvent.handle_event(self, event, setter) File "bokeh\document\events.py", line 246, in handle_event event_cls._handle_event(doc, event) File "bokeh\document\events.py", line 281, in _handle_event cb(event.msg_data) File "bokeh\document\callbacks.py", line 390, in trigger_event model._trigger_event(event) File "bokeh\util\callback_manager.py", line 113, in _trigger_event self.document.callbacks.notify_event(cast(Model, self), event, invoke) File "bokeh\document\callbacks.py", line 260, in notify_event invoke_with_curdoc(doc, callback_invoker) File "bokeh\document\callbacks.py", line 443, in invoke_with_curdoc return f() File "bokeh\util\callback_manager.py", line 109, in invoke cast(EventCallbackWithEvent, callback)(event) File "panel\reactive.py", line 491, in _server_event self._comm_event(doc, event) File "panel\reactive.py", line 478, in _comm_event state._handle_exception(e) File "panel\io\state.py", line 436, in _handle_exception raise exception File "panel\reactive.py", line 476, in _comm_event self._process_bokeh_event(doc, event) File "panel\reactive.py", line 413, in _process_bokeh_event self._process_event(event) File "panel\widgets\button.py", line 243, in _process_event self.clicks += 1 File "param\parameterized.py", line 528, in _f instance_param.set(obj, val) File "param\parameterized.py", line 530, in _f return f(self, obj, val) File "param\parameters.py", line 543, in set super().set(obj,val) File "param\parameterized.py", line 530, in _f return f(self, obj, val) File "param\parameterized.py", line 1553, in set obj.param._call_watcher(watcher, event) File "param\parameterized.py", line 2526, in _callwatcher self._execute_watcher(watcher, (event,)) File "param\parameterized.py", line 2506, in _execute_watcher watcher.fn(args, **kwargs) File "directlfq\dashboard_parts.py", line 331, in run_pipeline lfq_manager.run_lfq(input_file = input_file, input_type_to_use = input_type_to_use, maximum_number_of_quadratic_ions_to_use_per_protein = 10, File "directlfq\lfq_manager.py", line 48, in run_lfq input_df = lfqutils.import_data(input_file=input_file, input_type_to_use=input_type_to_use, filter_dict=filter_dict) File "directlfq\utils.py", line 807, in import_data file_to_read = reformat_and_save_input_file(input_file=input_file, input_type_to_use=input_type_to_use, filter_dict=filter_dict) File "directlfq\utils.py", line 824, in reformat_and_save_input_file input_type, config_dict_for_type, sep = get_input_type_and_config_dict(input_file, input_type_to_use) File "directlfq\utils.py", line 890, in get_input_type_and_config_dict raise TypeError("format not specified in intable_config.yaml!") TypeError: format not specified in intable_config.yaml!

ammarcsj commented 3 months ago

Hi, thanks for the feedback! It seems that DIA-NN changed some output table columns in the new version. I will adapt to this in the next release. Does setting to "DIA-NN precursor.tsv" still work?

Additionally, you can save the reformatted file you have now with the ending .aq_reformat.tsv directLFQ will then recognise that it is already in the correct format. In the new release I will add an option of setting "directlfq_format" to the GUI (currently only available in the python version).

ht-lau commented 3 months ago

Yes. If I provide the report.tsv from diann1.9.1, and set directllfq to auto detect, directlfq will run the precursor mode.

ammarcsj commented 3 months ago

Thanks for this feedback, have you tried running the reformatted file as I suggested? I will let you know once the new release is out.

ht-lau commented 3 months ago

Yes, the .aq_reformat.tsv with auto detection worked well.

Maybe an unrelated question. What is the differences between ms1, ms2 and ms1 + ms2 methods? Are there different computation in ms1+ms2? Or if it is just the reformatting of diann report?

Thanks

ammarcsj commented 3 months ago

Good to hear!

Regarding your question: It differs in the input information that is being used to derive the protein quantities. In other words, which columns of the DIANN table will be used for quantification. In MS1 (ms1 only I would not recommend most of the time), it will take only quantities derived from the MS1 profile. MS2 will take only quantity derived from the fragment ions, and MS1 + MS2 will take both quantities.

ht-lau commented 3 months ago

So it is like this?

MS1 protein ion sample1 sample2
prot A Precursor1
prot A Precursor2
prot A Precursor3
prot B Precursor1
prot B Precursor2
MS1 protein ion sample1 sample2
prot A Precursor1-frag1
prot A Precursor1-frag2
prot A Precursor1-frag3
prot A Precursor2-frag1
prot A Precursor2-frag2
prot B Precursor1-frag1
prot B Precursor1-frag2
prot B Precursor2-frag1
prot B Precursor2-frag2
MS1 + MS2 protein ion sample1 sample2
prot A Precursor1
prot A Precursor1-frag1
prot A Precursor1-frag2
prot A Precursor1-frag3
prot A Precursor2
prot A Precursor2-frag1
prot A Precursor2-frag2
prot A Precursor3
prot B Precursor1
prot B Precursor1-frag1
prot B Precursor1-frag2
prot B Precursor2
prot B Precursor2-frag1
prot B Precursor2-frag2
ammarcsj commented 3 months ago

Hi, close - MS1 is correct, but MS2 is more like this:

MS2

Protein Ion Sample 1 Sample 2
Prot A Precursor1-summarized_frag
Prot A Precursor2-summarized_frag
Prot B Precursor1-summarized_frag
Prot B Precursor2-summarized_frag

Precursor.Quantity or Precursor.Normalized in DIA-NN corresponds to -summarized_frag while the column MS1.Area gives the summarized MS1 intensity.

You can use the fragment ions in DIA-NN but in my benchmarks the summarized scores performed a bit better. If you still want to use fragment ions, I would not use the fragment.ions.raw, but the fragment.ions.corrected

ammarcsj commented 1 month ago

Hi, so the newest release now has the .parquet compatibility. Feel free to try it out :)