ISWC-Reproducibility-Track / Paper_611

0 stars 0 forks source link

Problem running csv2rdf on huse #2

Closed annalina closed 3 years ago

annalina commented 4 years ago

I've got the following problem (notice that the csv2rdf on hsup worked fine):

Parsing file: EXIOBASE_conversion_software/data/MR_HUSE_2011_v3_3_17.csv Traceback (most recent call last): File "/home/annalina/test/bin/csv2rdf-cli", line 11, in load_entry_point('EXIOBASE-conversion-software==0.5', 'console_scripts', 'csv2rdf-cli')() File "/home/annalina/test/lib/python3.6/site-packages/EXIOBASE_conversion_software-0.5-py3.6.egg/EXIOBASE_conversion_software/bin/csv2rdf_cli.py", line 57, in main File "/home/annalina/test/lib/python3.6/site-packages/EXIOBASE_conversion_software-0.5-py3.6.egg/EXIOBASE_conversion_software/init.py", line 22, in conversion File "/home/annalina/test/lib/python3.6/site-packages/EXIOBASE_conversion_software-0.5-py3.6.egg/EXIOBASE_conversion_software/csv2rdf.py", line 422, in csv2rdf File "/home/annalina/test/lib/python3.6/site-packages/pandas/io/parsers.py", line 676, in parser_f return _read(filepath_or_buffer, kwds) File "/home/annalina/test/lib/python3.6/site-packages/pandas/io/parsers.py", line 448, in _read parser = TextFileReader(fp_or_buf, kwds) File "/home/annalina/test/lib/python3.6/site-packages/pandas/io/parsers.py", line 880, in init self._make_engine(self.engine) File "/home/annalina/test/lib/python3.6/site-packages/pandas/io/parsers.py", line 1114, in _make_engine self._engine = CParserWrapper(self.f, self.options) File "/home/annalina/test/lib/python3.6/site-packages/pandas/io/parsers.py", line 1891, in init self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 374, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source FileNotFoundError: [Errno 2] File EXIOBASE_conversion_software/data/MR_HUSE_2011_v3_3_17.csv does not exist: 'EXIOBASE_conversion_software/data/MR_HUSE_2011_v3_3_17.csv' mv: cannot stat ‘EXIOBASE_conversion_software/data/flows_merged.nt’: No such file or directory gzip: output/exiobase_huse.nt: No such file or directory

any suggestions?

IKnowLogic commented 4 years ago

It seems the file MR_HUSE_2011_v3_3_17.csv does not exist. This file is created when running the script excel2csv-cli -i EXIOBASE_conversion_software/data/MR_HUSE_2011_v3_3_17.xlsb -o EXIOBASE_conversion_software/data/

What files are currently located in the folder EXIOBASE_conversion_software/data?

annalina commented 4 years ago

The directory contains these files:

exiobase_classifications_v_3_3_17.xlsx MR_HUSE_2011_v3_3_17_18.nt MR_HUSE_2011_v3_3_17_26.nt MR_HUSE_2011_v3_3_17_5.nt MR_HUSE_2011_v3_3_17_10.nt MR_HUSE_2011_v3_3_17_19.nt MR_HUSE_2011_v3_3_17_27.nt MR_HUSE_2011_v3_3_17_6.nt MR_HUSE_2011_v3_3_17_11.nt MR_HUSE_2011_v3_3_17_1.nt MR_HUSE_2011_v3_3_17_28.nt MR_HUSE_2011_v3_3_17_7.nt MR_HUSE_2011_v3_3_17_12.nt MR_HUSE_2011_v3_3_17_20.nt MR_HUSE_2011_v3_3_17_29.nt MR_HUSE_2011_v3_3_17_8.nt MR_HUSE_2011_v3_3_17_13.nt MR_HUSE_2011_v3_3_17_21.nt MR_HUSE_2011_v3_3_17_2.nt MR_HUSE_2011_v3_3_17_9.nt MR_HUSE_2011_v3_3_17_14.nt MR_HUSE_2011_v3_3_17_22.nt MR_HUSE_2011_v3_3_17_30.nt MR_HUSE_2011_v3_3_17.csv MR_HUSE_2011_v3_3_17_15.nt MR_HUSE_2011_v3_3_17_23.nt MR_HUSE_2011_v3_3_17_31.nt MR_HUSE_2011_v3_3_17_16.nt MR_HUSE_2011_v3_3_17_24.nt MR_HUSE_2011_v3_3_17_3.nt MR_HUSE_2011_v3_3_17_17.nt MR_HUSE_2011_v3_3_17_25.nt MR_HUSE_2011_v3_3_17_4.nt

I've also tried to run the excel2csv file on the huse dataset again, but I get the following error now:

Traceback (most recent call last): File "/home/annalina/test/bin/excel2csv-cli", line 11, in load_entry_point('EXIOBASE-conversion-software==0.5', 'console_scripts', 'excel2csv-cli')() File "/home/annalina/test/lib/python3.6/site-packages/EXIOBASE_conversion_software-0.5-py3.6.egg/EXIOBASE_conversion_software/bin/excel2csv_cli.py", line 31, in main File "/home/annalina/test/lib/python3.6/site-packages/EXIOBASE_conversion_software-0.5-py3.6.egg/EXIOBASE_conversion_software/init.py", line 20, in conversion File "/home/annalina/test/lib/python3.6/site-packages/EXIOBASE_conversion_software-0.5-py3.6.egg/EXIOBASE_conversion_software/excel2csv.py", line 165, in excel2csv File "/home/annalina/test/lib/python3.6/site-packages/EXIOBASE_conversion_software-0.5-py3.6.egg/EXIOBASE_conversion_software/excel2csv.py", line 90, in xlsb2csv File "/home/annalina/test/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 304, in read_excel io = ExcelFile(io, engine=engine) File "/home/annalina/test/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 824, in init self._reader = self._enginesengine File "/home/annalina/test/lib/python3.6/site-packages/pandas/io/excel/_pyxlsb.py", line 21, in init super().init(filepath_or_buffer) File "/home/annalina/test/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 353, in init self.book = self.load_workbook(filepath_or_buffer) File "/home/annalina/test/lib/python3.6/site-packages/pandas/io/excel/_pyxlsb.py", line 36, in load_workbook return open_workbook(filepath_or_buffer) File "/home/annalina/test/lib/python3.6/site-packages/pyxlsb/init.py", line 10, in open_workbook zf = ZipFile(name, 'r') File "/home/support/apps/cports/rhel-7.x86_64/gnu/Python/3.6.6/lib/python3.6/zipfile.py", line 1090, in init self.fp = io.open(file, filemode) FileNotFoundError: [Errno 2] No such file or directory: 'EXIOBASE_conversion_software/data/MR_HUSE_2011_v3_3_17.xlsb'

however the first time -before running csv2rdf- it was successful:

Parsing file: EXIOBASE_conversion_software/data/MR_HUSE_2011_v3_3_17.xlsb Parsed sheet has size (9892, 7877) Parsed 0 Parsed 50 Parsed 100 Parsed 150 Parsed 200 ... Parsed 9800 Parsed 9850 Saving to EXIOBASE_conversion_software/data/MR_HUSE_2011_v3_3_17.csv

IKnowLogic commented 4 years ago

The reason you can't run the excel2csv script, is because the MR_HUSE_2011_v3_3_17.xlsb file is not in the data folder.

Next steps: 1: From the root folder for all repos, download the data again with these commands:

wget 'https://silo1.sciencedata.dk/themes/deic_theme_oc7/apps/files_sharing/public.php?service=files&t=20ee45e130a37e87c5b19e07b81b61ec&path=%2Fexiobase-3.3.17&files=EXIOBASE_3.3.17_hsut_2011.zip&download&g=' -O exiobase-dataset.zip

unzip exiobase-dataset.zip
rm -rf exiobase-dataset.zip

mv EXIOBASE_3.3.17_hsut_2011/MR_HUSE_2011_v3_3_17.xlsb EXIOBASE-conversion-software/EXIOBASE_conversion_software/data/

2: Enter the EXIOBASE-conversion-software and run the following command: excel2csv-cli -i EXIOBASE_conversion_software/data/MR_HUSE_2011_v3_3_17.xlsb -o EXIOBASE_conversion_software/data/

This creates the csv file in the data folder, which was missing before.

3: Now you can continue from this command, which extracts rdf data from the csv file: csv2rdf-cli -i EXIOBASE_conversion_software/data/MR_HUSE_2011_v3_3_17.csv -o EXIOBASE_conversion_software/data/ -c HUSE --flowtype input --multifile 100000 --merge True

It greatly helps only running one command at a time, as some commands will interfere with the workflow if the previous does not run correctly.

Thanks, Emil

kuzeko commented 3 years ago

@IKnowLogic @annalina can we close this issue?

IKnowLogic commented 3 years ago

@kuzeko Yes, I will close it