ebi-ait / dcp-ingest-central

Central point of access for the Ingestion Service of the HCA DCP
Apache License 2.0
0 stars 0 forks source link

HCA-to-scea tools not working for GSE135893_HCAD15 #513

Closed ESapenaVentura closed 2 years ago

ESapenaVentura commented 3 years ago

Describe the bug When running the hca-to-scea tool, I run into the following error:

Traceback (most recent call last):
  File "hca2scea.py", line 584, in <module>
    main()
  File "hca2scea.py", line 581, in main
    create_magetab(work_dir, tracking_sheet, xlsx_dict, dataset_protocol_map, df, args)
  File "hca2scea.py", line 444, in create_magetab
    generate_sdrf_file(work_dir, args, df, dataset_protocol_map, sdrf_file_name)
  File "hca2scea.py", line 378, in generate_sdrf_file
    sdrf_3.insert(idx, col, list(protocols_sdrf_before_sequencing[col]))
  File "/data/tools/hca-to-scea-tools/hca2scea-backend/venv/lib/python3.6/site-packages/pandas/core/frame.py", line 3627, in insert
    value = self._sanitize_column(column, value, broadcast=False)
  File "/data/tools/hca-to-scea-tools/hca2scea-backend/venv/lib/python3.6/site-packages/pandas/core/frame.py", line 3768, in _sanitize_column
    value = sanitize_index(value, self.index)
  File "/data/tools/hca-to-scea-tools/hca2scea-backend/venv/lib/python3.6/site-packages/pandas/core/internals/construction.py", line 748, in sanitize_index
    "Length of values "
ValueError: Length of values (1691) does not match length of index (83)

To Reproduce Steps to reproduce the behaviour:

  1. Go to the EC2
  2. Activate the hca-to-scea environment
  3. cd to /data/tools/hca-to-scea-tools/hca2scea-backend
  4. run python3 hca2scea.py -s /home/eventura/GSE135893_HCAD15.xlsx -id c1a9a93d-d9de-4e65-9619-a9cec1052eaa -study SRP218543 -ac 15 -c ESV -tt 10Xv2_3 -et differential -f diseases -pd 2019-09-05 -hd 2020-07-15 --facs
  5. See error

Expected behaviour It should convert the dataset

Environment

Source spreadsheet GSE135893_HCAD15.xlsx

ESapenaVentura commented 3 years ago

@ami-day to look on this when back

ESapenaVentura commented 3 years ago

@amnonkhen to take a look

amnonkhen commented 3 years ago

I am able to recreate the problem on my env. Now - investigate!

gabsie commented 3 years ago

hey @ami-day - as this is now in your hands, please update with progress here maybe, as well as please check with @ESapenaVentura how to unblock him. Thank you!

ami-day commented 3 years ago

I'll have a look at this tomorrow, as we have the DCP meeting now.

yusra-haider commented 3 years ago

@ami-day worked on it and fixed it. fixed to be reviewed / tested by @ESapenaVentura

changes to be reviewed by dev team as well.

amnonkhen commented 3 years ago

@ami-day fixes on Friday, will create a PR for review.

ESapenaVentura commented 3 years ago

I still get the same error

idazucchi commented 3 years ago

@ami-day to check that the right branch is installed in the EC2

ami-day commented 3 years ago

Pull request: https://github.com/ebi-ait/hca-to-scea-tools/pull/57

ofanobilbao commented 3 years ago

@ESapenaVentura did not have the chance to test if the fixes fix his issues

gabsie commented 2 years ago

hey @ESapenaVentura :) did this work?

ke4 commented 2 years ago

@ESapenaVentura tested it and it worked. He is going to test it with another dataset today.

ke4 commented 2 years ago

@amnonkhen needs to find the files and communicate with wranglers.

MightyAx commented 2 years ago

@amnonkhen to continue this.