Processing the iso3166.csv file
Since it is said that we need to create the script for downloading the datasets from unstats.un.org website, currently I am busy with fixing the whole codebase, for this case I am currently doing it the same way as the original author's did. (Going to update it as soon as I finish with the current codebase).
Added scripts/config.py file for easy access to all the links and column_headers
Added scripts/iso3166.py script for easy creation of iso3166.json file
Updated the scripts/edgar.py using selenium for parsing the sec.gov website
The reason I am using the selenium because sec.gov's sitemap disallows parsing through bs4 or requests, in terms of API it only worls for certain pages which are not available for these paths: /submit-filings/filer-support-resources/edgar-state-country-codes, in terms of GITHUB_ACTIONS there might be problems opening browser in the no --headless mode which are needed for parsing because in --headless mode it doesn't work, solution is to use Xvfb for setting up the pipeline
@anuveyatsu please review this by now I will make more granular PR, probably will only fix scripts/iso3166.py other files should be good, tested it locally
Changes made:
scripts/config.py
file for easy access to all the links and column_headersscripts/iso3166.py
script for easy creation ofiso3166.json
filescripts/edgar.py
using selenium for parsing thesec.gov
websitesec.gov's
sitemap disallows parsing through bs4 or requests, in terms of API it only worls for certain pages which are not available for these paths:/submit-filings/filer-support-resources/edgar-state-country-codes
, in terms of GITHUB_ACTIONS there might be problems opening browser in theno --headless
mode which are needed for parsing because in--headless
mode it doesn't work, solution is to useXvfb
for setting up the pipelinescripts/unterm_names.py
scriptscripts/format_json.py
script