fedspendingtransparency / usaspending-api

Server application to serve U.S. federal spending data via a RESTful API
https://www.usaspending.gov
Creative Commons Zero v1.0 Universal
308 stars 110 forks source link

Download awards endpoint fails when specifying columns #1898

Open nhinze opened 5 years ago

nhinze commented 5 years ago

The query fails, when I specify columns to download with the:

/api/v2/download/awards

endpoint. It works fine when I set:

"columns": []

Full JSON options:

{"columns":["obligated_amount","product_or_service_code","product_or_service_code_description","naics_code","naics_description"],"filters":{"keywords":["transportation"," aviation"],"time_period":[{"start_date":"2018-06-30","end_date":"2019-06-30"}]}}

I do not get any error message. Just 500 Internal Server Error.

kbard commented 5 years ago

Can I ask which tool you're using to run this query or are you using the default GUI that comes up when you visit the URL?

nhinze commented 5 years ago

The online GUI and my code give me the same error. I use httpparty gem on RubyOnRails to access the API.

kbard commented 5 years ago

We are currently experiencing an issue with the online GUI. If you're using an external tool, be sure to set the Content-Type header to application/json.

nhinze commented 5 years ago

I did set it to JSON. The following works just fine:

{"columns":[],"filters":{"keywords":["transportation"," aviation"],"time_period":[{"start_date":"2018-06-30","end_date":"2019-06-30"}]}}

I'm trying to reduce the size of the download.

kbard commented 5 years ago

Have you tried your original query recently? It seems to be working for me this morning.

nhinze commented 5 years ago

Still not working. The status is:

running running running running running running running failed

However, I don't get anything in the error message telling me what's wrong. I'm going to try the same query in postman.

kbard commented 5 years ago

Please (re)post the exact query and URL you are using. I will try to reproduce again.

nhinze commented 5 years ago

The file should be:

https://files.usaspending.gov/generated_downloads/all_prime_awards_subawards_20190701135515927430.zip

nhinze commented 5 years ago

The options in JSON:

{"columns":["obligated_amount","product_or_service_code","product_or_service_code_description","naics_code","naics_description"],"filters":{"keywords":["transportation","aviation"],"time_period":[{"start_date":"2018-07-01","end_date":"2019-07-01"}]}}

The response of the call to: 'https://api.usaspending.gov/api/v2/download/awards'

{"status":"ready","total_columns":null,"total_rows":null,"seconds_elapsed":null,"url":"https://files.usaspending.gov/generated_downloads/all_prime_awards_subawards_20190701135642182405.zip","total_size":null,"file_name":"all_prime_awards_subawards_20190701135642182405.zip","message":null} https://files.usaspending.gov/generated_downloads/all_prime_awards_subawards_20190701135642182405.zip

The call to the API:

self.class.post('/download/awards', :body => options.to_json, :headers => { 'Content-Type' => 'application/json' })

kbard commented 5 years ago

When I query https://api.usaspending.gov/api/v2/download/awards with

{
    "columns": [
        "obligated_amount",
        "product_or_service_code",
        "product_or_service_code_description",
        "naics_code",
        "naics_description"
    ],
    "filters": {
        "keywords": [
            "transportation",
            "aviation"
        ],
        "time_period": [
            {
                "start_date": "2018-07-01",
                "end_date": "2019-07-01"
            }
        ]
    }
}

in Postman, I get

{
    "url": "https://files.usaspending.gov/generated_downloads/all_prime_awards_subawards_20190701140105877504.zip",
    "file_name": "all_prime_awards_subawards_20190701140105877504.zip",
    "total_columns": null,
    "message": null,
    "status": "ready",
    "total_rows": null,
    "seconds_elapsed": null,
    "total_size": null
}

Have you had a chance to try it in Postman yet so we can eliminate other issues?

nhinze commented 5 years ago

It get the same, what does the status give you? If I try to download the file of your query, it is not working:

https://files.usaspending.gov/generated_downloads/all_prime_awards_subawards_20190701140105877504.zip

nhinze commented 5 years ago

Trying the status of you query, I get the more detailed error message:

https://api.usaspending.gov/api/v2/download/status/?file_name=all_prime_awards_subawards_20190701140105877504.zip

{
    "url": "https://files.usaspending.gov/generated_downloads/all_prime_awards_subawards_20190701140105877504.zip",
    "file_name": "all_prime_awards_subawards_20190701140105877504.zip",
    "total_columns": 5,
    "message": "An exception was raised while attempting to process the DownloadJob:\nTraceback (most recent call last):\n  File \"/data-act/backend/usaspending_api/download/filestreaming/csv_generation.py\", line 58, in generate_csvs\n    parse_source(source, columns, download_job, working_dir, piid, zip_file_path, limit)\n  File \"/data-act/backend/usaspending_api/download/filestreaming/csv_generation.py\", line 165, in parse_source\n    temp_file, temp_file_path = generate_temp_query_file(source_query, limit, source, download_job, columns)\n  File \"/data-act/backend/usaspending_api/download/filestreaming/csv_generation.py\", line 290, in generate_temp_query_file\n    csv_query_annotated = apply_annotations_to_sql(generate_raw_quoted_query(source_query), source.columns(columns))\n  File \"/data-act/backend/usaspending_api/download/filestreaming/csv_generation.py\", line 337, in apply_annotations_to_sql\n    raise Exception(\"Length of alises doesn't match the columns in selects\")\nException: Length of alises doesn't match the columns in selects\n",
    "status": "failed",
    "total_rows": 296369,
    "seconds_elapsed": "11.680957",
    "total_size": null
}
kbard commented 5 years ago

Ok. You are correct. There does, indeed, seem to be an issue with downloads. With the information you provided I was able to nail down exactly where things are going awry. Thank you very much for your help.

I will discuss this issue with the project lead and post back here when I know something.

kbard commented 5 years ago

I have created two bug tickets that have been added to the prioritized bug list. I do not have an ETA.

https://federal-spending-transparency.atlassian.net/browse/DEV-2998 https://federal-spending-transparency.atlassian.net/browse/DEV-2999

Thank you again!

ross-williford commented 5 years ago

This should now be resolved