NASA-PDS / web-analytics

Other
0 stars 0 forks source link

For EN search table, there seems to be parsing error with log files #20

Open kaipak opened 1 year ago

kaipak commented 1 year ago

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

datetime in Apache log follow format of "[DD/MM/yyyy:HH:MM:SS -tz]". Somehow some rows are getting a '-' appended. This causes a failure in DATE_PARSE SQL function as it gets just the '-' instead of the date when using SPLIT.

Likely a data integrity problem. We might need script to check for bad input. Might also be related to blank lines appearing in data tables as well.

🕵️ Expected behavior

Table should be created. Workaround has been to filter for "-" has been implemented, however, this results in the loss of 80k rows.

📜 To Reproduce

  1. On EN logs, run DATE_PARSE on SPLIT of datetime column: DATE(DATE_PARSE(SPLIT(datetime, ' ')[1], '[%d/%M/%Y:%H:%i:%s')) as date,

🖥 Environment Info

No response

📚 Version of Software Used

No response

🩺 Test Data / Additional context

No response

🦄 Related requirements

🦄 #xyz

⚙️ Engineering Details

No response

jordanpadams commented 11 months ago

Workaround implemented. Will eventually want to do some more extensive cleaning here.