datasets / s-and-p-500-companies

List of companies in the S&P 500 together with associated financials
https://datahub.io/core/s-and-p-500-companies
499 stars 491 forks source link

Issue with Makefile #35

Closed markgahagan1 closed 1 year ago

markgahagan1 commented 1 year ago

I am trying to fathom out why Makefile fails. It appears that due to its age, it needs a number of amendments.

Issues seem to be:

  1. The file locations are incorrect as one doesn't exist: eg: ../data/constituents-financials.csv - if removed the file will at least create the 2 output files for constituents(provided you change the tmp directory address in constituents.py at lines 12, 13 and 16 to "../scripts/tmp" from just scripts/tmp. ( This also permits constituents.py to be run from the scripts directory in terminal)

  2. The test_data.py file relies on goodtables which has since been deprecated. I have tried to get my head around this file and replace goodtables using the 'frictionlessdata framework' package(which I believe replaced goodtables), but I am afraid neither my python or computer science ability are yet up to that challenge.

  3. In summary my suggestion would be to rewrite test_data.py using frictionlessdata framework package for validation and then rewrite the Make file referencing the amended data locations and finally make amendments at lines 12, 13 and 16 in constiturents.py to correctly reference the tmp directory. I realise this is a big ask as it is way above my league so I understand the workload.

My attempt at updating Makefile is below. It correctly creates the 2 output files in the correct directory then errors at the goodtables imports in test_data.py:

MAKEFILE:

__all: pushed.txt

../data: mkdir ../data

../data List_of_S%26P_500_companies.html: constituents.py python constituents.py

../data/constituents.csv: ../data List_of_S%26P_500_companies.html constituents.py python constituents.py

valid.txt: ../data/constituents.csv ../datapackage.json test_data.py python test_data.py echo "Datapackage is valid" > valid.txt

pushed.txt: valid.txt git add ../data/constituents.csv ../data/constituents-financials.csv git add ../data/constituents_symbols.txt ../data/constituents-symbols.txt git commit -m "[data][skip ci] automatic update" || exit 0 git push publish echo "Update has been pushed if there was a change" > pushed.txt

.PHONY: all__

rufuspollock commented 1 year ago

@markgahagan1 very happy to radically simplify this - e.g. if you prefer python we can dump the Makefile and just have a simple python file.

Don't worry too much about the test for now.

/cc @PhilippeduPreez

markgahagan1 commented 1 year ago

Yes that sounds great thanks

davidgasquez commented 1 year ago

Closing this as https://github.com/datasets/s-and-p-500-companies/pull/36 changed the Makefile entirely.