MannLabs / directlfq

Fast and accurate label-free quantification for small and very large numbers of proteomes
https://www.mcponline.org/article/S1535-9476(23)00092-0/fulltext
Apache License 2.0
43 stars 5 forks source link

directLFQ 0.2.16 fails with IndexError: list index out of range #26

Closed GeorgWa closed 9 months ago

GeorgWa commented 10 months ago

Describe the bug The most recent version of directLFQ fails with IndexError: list index out of range during the alphaDIA testcase.

To Reproduce Steps to reproduce the behavior:

  1. run the test case test_output_transform() in alphadia/tests/unit_tests/test_outputtransform.py

Expected behavior A clear and concise description of what you expected to happen.

Logs

================================================================================================= test session starts =================================================================================================
platform darwin -- Python 3.9.18, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/georgwallmann/Documents/git/alphadia
collected 59 items / 5 deselected / 54 selected                                                                                                                                                                       

tests/unit_tests/test_calibration.py ....                                                                                                                                                                       [  7%]
tests/unit_tests/test_data.py ..                                                                                                                                                                                [ 11%]
tests/unit_tests/test_fdr.py .....                                                                                                                                                                              [ 20%]
tests/unit_tests/test_fragcomp.py ...                                                                                                                                                                           [ 25%]
tests/unit_tests/test_grouping.py .........                                                                                                                                                                     [ 42%]
tests/unit_tests/test_libtransform.py .                                                                                                                                                                         [ 44%]
tests/unit_tests/test_numba.py ....                                                                                                                                                                             [ 51%]
tests/unit_tests/test_outputtransform.py F                                                                                                                                                                      [ 53%]
tests/unit_tests/test_plexscoring.py .                                                                                                                                                                          [ 55%]
tests/unit_tests/test_plotting.py ..                                                                                                                                                                            [ 59%]
tests/unit_tests/test_quadrupole.py ...                                                                                                                                                                         [ 64%]
tests/unit_tests/test_reporting.py ......                                                                                                                                                                       [ 75%]
tests/unit_tests/test_utils.py ....                                                                                                                                                                             [ 83%]
tests/unit_tests/test_workflow.py .........                                                                                                                                                                     [100%]

====================================================================================================== FAILURES =======================================================================================================
________________________________________________________________________________________________ test_output_transform ________________________________________________________________________________________________

    def test_output_transform():
        run_columns = ["run_0", "run_1", "run_2"]

        config = {
            "general": {
                "thread_count": 8,
            },
            "fdr": {
                "fdr": 0.01,
                "inference_strategy": "heuristic",
                "group_level": "proteins",
                "keep_decoys": False,
            },
            "search_output": {
                "min_k_fragments": 3,
                "min_correlation": 0.25,
                "num_samples_quadratic": 50,
                "min_nonnan": 1,
                "normalize_lfq": True,
                "peptide_level_lfq": False,
                "precursor_level_lfq": False,
            },
        }

        temp_folder = os.path.join(tempfile.gettempdir(), "alphadia")
        os.makedirs(temp_folder, exist_ok=True)

        progress_folder = os.path.join(temp_folder, "progress")
        os.makedirs(progress_folder, exist_ok=True)

        # setup raw folders
        raw_folders = [os.path.join(progress_folder, run) for run in run_columns]

        psm_base_df = _mock_precursor_df(n_precursor=100)
        fragment_base_df = _mock_fragment_df(n_precursor=200)

        for raw_folder in raw_folders:
            os.makedirs(raw_folder, exist_ok=True)

            psm_df = psm_base_df.sample(50)
            psm_df["run"] = os.path.basename(raw_folder)
            frag_df = fragment_base_df[
                fragment_base_df["precursor_idx"].isin(psm_df["precursor_idx"])
            ]

            frag_df.to_csv(os.path.join(raw_folder, "frag.tsv"), sep="\t", index=False)
            psm_df.to_csv(os.path.join(raw_folder, "psm.tsv"), sep="\t", index=False)

        output = outputtransform.SearchPlanOutput(config, temp_folder)
        _ = output.build_precursor_table(raw_folders, save=True)
        _ = output.build_stat_df(raw_folders, save=True)
>       _ = output.build_lfq_tables(raw_folders, save=True)

tests/unit_tests/test_outputtransform.py:169: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
alphadia/outputtransform.py:645: in build_lfq_tables
    lfq_df = qb.lfq(
alphadia/outputtransform.py:276: in lfq
    protein_df, _ = lfqprot_estimation.estimate_protein_intensities(
../../../miniconda3/envs/alpha/lib/python3.9/site-packages/directlfq/protein_intensity_estimation.py:37: in estimate_protein_intensities
    ion_df = get_ion_intensity_dataframe_from_list_of_shifted_peptides(list_of_tuple_w_protein_profiles_and_shifted_peptides, allprots)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

list_of_tuple_w_protein_profiles_and_shifted_peptides = [(array([9.24812545, 9.24812545,        nan]),                               0         1   2
pg    ion                ...6417  1.6417  1.6417
      695990860382217  1.6417  1.6417  1.6417
      695995155349513  1.6417  1.6417  1.6417), ...]
allprots = ['EPROT', 'VPROT', 'ZPROT', 'LPROT', 'FPROT', 'SPROT', ...]

    def get_ion_intensity_dataframe_from_list_of_shifted_peptides(list_of_tuple_w_protein_profiles_and_shifted_peptides, allprots):
        ion_names = []
        ion_vals = []
        protein_names = []
        column_names = list_of_tuple_w_protein_profiles_and_shifted_peptides[0][1].columns.tolist()
        for idx in range(len(list_of_tuple_w_protein_profiles_and_shifted_peptides)):
>           protein_name = allprots[idx]
E           IndexError: list index out of range

../../../miniconda3/envs/alpha/lib/python3.9/site-packages/directlfq/protein_intensity_estimation.py:206: IndexError
------------------------------------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------------------------------------
2024-01-24 12:08:13> Performing protein grouping and FDR
2024-01-24 12:08:13> Building output for run_0
2024-01-24 12:08:13> Building output for run_1
2024-01-24 12:08:13> Building output for run_2
2024-01-24 12:08:13> Building combined output
2024-01-24 12:08:13> Performing protein inference
2024-01-24 12:08:13> Inference strategy: heuristic. Using maximum parsimony with grouping for protein inference
2024-01-24 12:08:13> Performing protein FDR
2024-01-24 12:08:13> Test AUC: 1.000
2024-01-24 12:08:13> Train AUC: 1.000
2024-01-24 12:08:13> AUC difference: 0.00%
2024-01-24 12:08:13> ================ Protein FDR =================
2024-01-24 12:08:13> Unique protein groups in output
2024-01-24 12:08:13>   1% protein FDR: 24
2024-01-24 12:08:13> 
2024-01-24 12:08:13> Unique precursor in output
2024-01-24 12:08:13>   1% protein FDR: 42
2024-01-24 12:08:13> ================================================
2024-01-24 12:08:13> Writing precursor output to disk
2024-01-24 12:08:13> Building search statistics
2024-01-24 12:08:13> Reading precursors.tsv file
2024-01-24 12:08:13> Writing stat output to disk
2024-01-24 12:08:13> Performing label free quantification
2024-01-24 12:08:13> Reading precursors.tsv file
2024-01-24 12:08:13> Accumulating fragment data
2024-01-24 12:08:13> reading frag file for run_0
2024-01-24 12:08:13> reading frag file for run_1
2024-01-24 12:08:13> reading frag file for run_2
2024-01-24 12:08:13> Performing label free quantification on the pg level
2024-01-24 12:08:13> Filtering fragments by quality
2024-01-24 12:08:13> Performing label-free quantification using directLFQ
2024-01-24 12:08:13> to few values for normalization without missing values. Including missing values
2024-01-24 12:08:13> 24 lfq-groups total
2024-01-24 12:08:13> using 8 processes
2024-01-24 12:08:13> lfq-object 0
-------------------------------------------------------------------------------------------------- Captured log call --------------------------------------------------------------------------------------------------
PROGRESS root:outputtransform.py:419 Performing protein grouping and FDR
INFO     root:outputtransform.py:427 Building output for run_0
INFO     root:outputtransform.py:427 Building output for run_1
INFO     root:outputtransform.py:427 Building output for run_2
INFO     root:outputtransform.py:446 Building combined output
INFO     root:outputtransform.py:456 Performing protein inference
INFO     root:outputtransform.py:488 Inference strategy: heuristic. Using maximum parsimony with grouping for protein inference
INFO     root:outputtransform.py:501 Performing protein FDR
INFO     root:fdr.py:355 Test AUC: 1.000
INFO     root:fdr.py:356 Train AUC: 1.000
INFO     root:fdr.py:359 AUC difference: 0.00%
PROGRESS root:outputtransform.py:508 ================ Protein FDR =================
PROGRESS root:outputtransform.py:511 Unique protein groups in output
PROGRESS root:outputtransform.py:512   1% protein FDR: 24
PROGRESS root:outputtransform.py:513 
PROGRESS root:outputtransform.py:514 Unique precursor in output
PROGRESS root:outputtransform.py:515   1% protein FDR: 42
PROGRESS root:outputtransform.py:516 ================================================
INFO     root:outputtransform.py:524 Writing precursor output to disk
PROGRESS root:outputtransform.py:560 Building search statistics
INFO     root:outputtransform.py:390 Reading precursors.tsv file
INFO     root:outputtransform.py:576 Writing stat output to disk
PROGRESS root:outputtransform.py:607 Performing label free quantification
INFO     root:outputtransform.py:390 Reading precursors.tsv file
INFO     root:outputtransform.py:123 Accumulating fragment data
INFO     root:outputtransform.py:58 reading frag file for run_0
INFO     root:outputtransform.py:58 reading frag file for run_1
INFO     root:outputtransform.py:58 reading frag file for run_2
PROGRESS root:outputtransform.py:633 Performing label free quantification on the pg level
INFO     root:outputtransform.py:208 Filtering fragments by quality
INFO     root:outputtransform.py:255 Performing label-free quantification using directLFQ
INFO     directlfq.normalization:normalization.py:239 to few values for normalization without missing values. Including missing values
INFO     directlfq.protein_intensity_estimation:protein_intensity_estimation.py:32 24 lfq-groups total
INFO     directlfq.protein_intensity_estimation:protein_intensity_estimation.py:107 using 8 processes
================================================================================================== warnings summary ===================================================================================================
tests/unit_tests/test_fragcomp.py::test_fragment_competition
  /Users/georgwallmann/Documents/git/alphadia/alphadia/fragcomp.py:189: FutureWarning: The provided callable <built-in function min> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
    index_df = frag_df.groupby("_candidate_idx", as_index=False).agg(

tests/unit_tests/test_fragcomp.py::test_fragment_competition
  /Users/georgwallmann/Documents/git/alphadia/alphadia/fragcomp.py:189: FutureWarning: The provided callable <built-in function max> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
    index_df = frag_df.groupby("_candidate_idx", as_index=False).agg(

tests/unit_tests/test_fragcomp.py::test_fragment_competition
  /Users/georgwallmann/Documents/git/alphadia/alphadia/fragcomp.py:247: FutureWarning: The provided callable <built-in function min> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
    index_df = psm_df.groupby("window_idx", as_index=False).agg(

tests/unit_tests/test_fragcomp.py::test_fragment_competition
  /Users/georgwallmann/Documents/git/alphadia/alphadia/fragcomp.py:247: FutureWarning: The provided callable <built-in function max> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
    index_df = psm_df.groupby("window_idx", as_index=False).agg(

tests/unit_tests/test_outputtransform.py::test_output_transform
  /Users/georgwallmann/Documents/git/alphadia/alphadia/outputtransform.py:458: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    psm_df["mods"].fillna("", inplace=True)

tests/unit_tests/test_outputtransform.py::test_output_transform
  /Users/georgwallmann/Documents/git/alphadia/alphadia/outputtransform.py:461: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    psm_df["mod_sites"].fillna("", inplace=True)

tests/unit_tests/test_outputtransform.py::test_output_transform
  /Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:691: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
    warnings.warn(

tests/unit_tests/test_outputtransform.py::test_output_transform
  /Users/georgwallmann/Documents/git/alphadia/alphadia/fdr.py:403: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
    plt.show()

tests/unit_tests/test_plotting.py::test_plot_cycle
  /Users/georgwallmann/Documents/git/alphadia/alphadia/plotting/cycle.py:189: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead.
    cmap = cm.get_cmap(cmap_name)

tests/unit_tests/test_plotting.py::test_plot_cycle
  /Users/georgwallmann/Documents/git/alphadia/alphadia/plotting/cycle.py:46: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead.
    cmap = cm.get_cmap(cmap_name)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================================================================================== short test summary info ===============================================================================================
FAILED tests/unit_tests/test_outputtransform.py::test_output_transform - IndexError: list index out of range
============================================================================== 1 failed, 53 passed, 5 deselected, 10 warnings in 34.87s ===============================================================================
ammarcsj commented 9 months ago

fixed in the new version