Closed jefw closed 8 years ago
Signed the agreement via clahub.
Thank you! We will take a look at these.
As they say, no good deed goes unpunished. Any chance you want to take a swing at updating the corresponding page in the documentation? If not, that is totally fine.
Yes, no problem. Just glancing at the docs made me realize A_RunETL.bat needs changes anyway. I only took it as far as capturing and outputting the command, not actually running it. There's no real equivalent to eval in the CMD.EXE, so I may have to resort to making a temporary file.
Okay - I've updated the docs and the A_RunETL.bat to run the commands. Out of curiosity, what do you intend for the RunETL script to do in case multiple jobs are matched? I imagine it would be a fairly common failure mode for the user to accidentally press enter prematurely, maybe giving something like:
$ sh RunETL.sh t
[oops I hit enter while aiming for the "3" on the keypad]
In this case the script would match any jobs with a "t" - which could be a lot of stray jobs.
The shell script would compound all this onto one line, due to the use of eval, and the command would almost certainly fail to run.
The batch file, however, would run each job one by one, as the tasks are being parsed out in a loop, and the call inside this.
Let me know what the desired behavior is, or raise a new issue, and I can tackle this if you want.
Reasonable point. The desired behavior would be to fail without running any jobs.
I have not really thought through ease of implementation in either a shell script or batch file but I suppose a good validity check would be to make sure the parameter is nine characters (good) or four alphanumeric dash four-alphanumeric (better). Nine with the middle one being a dash would be almost as good.
All that said, I have done so much worse things by mistake than running some ETLs needlessly so we can also live with the risk if necessary. ☺
Thanks.
From: Jef Waltman [mailto:notifications@github.com] Sent: Friday, October 16, 2015 10:03 AM To: Chicago/open-data-etl-utility-kit open-data-etl-utility-kit@noreply.github.com Cc: Levy, Jonathan Jonathan.Levy@cityofchicago.org Subject: Re: [open-data-etl-utility-kit] Issue 3 (#30)
Okay - I've updated the docs and the A_RunETL.bat to run the commands. Out of curiosity, what do you intent for the RunETL script to in case multiple jobs are matched? I imagine it would be a fairly common failure mode for the user to accidentally press enter prematurely, maybe giving something like: $ sh RunETL.sh t [oops I hit enter while aiming for the "3" on the keypad]
In this case the script would match any jobs with a "t" - which could be a lot of stray jobs.
The shell script would compound all this onto one line, due to the use of eval, and the command would almost certainly fail to run.
The batch file, however, would run each job one by one, as the tasks are being parsed out in a loop, and the call inside this.
Let me know what the desired behavior is, or raise a new issue, and I can tackle this if you want.
— Reply to this email directly or view it on GitHubhttps://github.com/Chicago/open-data-etl-utility-kit/pull/30#issuecomment-148740542.
This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), you are hereby notified that any dissemination, distribution, printing or copying of this e-mail, and any attachment thereto, is strictly prohibited. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the original and any copy of any e-mail and printout thereof.
Added validation requiring the Dataset4x4 argument to be (exactly) 4 alphanum dash 4 alphanum per your suggestion. Hooray regex. I think this is a reasonable safeguard. It's not so much running an ETL that I would worry about, it's other stuff that may be lurking in the crontab or task scheduler.
A_DatasetLogs.bat
- looks great.
A_ETLRuntimes.bat
- something is not quite right. Here is a zipped folder of sample logs (sorry, this would have been helpful for you) and the commands didn't quite work. The parsing seems to be stopping after the header.
A_TodayLogs.bat
- terrific.
For A_ETLRuntimes.bat
Shell output (correct):
$ A_ETLRuntimes.sh f7f2-ggz5
INFO 01-08 06:00:40,695 - Kitchen - Processing ended after 19 seconds.
INFO 02-08 06:00:39,526 - Kitchen - Processing ended after 18 seconds.
Batch file output:
> A_ETLRuntimes.bat k7hf-8y75
WARN 01-08 00:45:20,949 - Unable to load Hadoop Configuration from "file:///pat
h/to/directory/data-integration/plugins/pentaho-big-data-plugin/hadoop-configura
tions/mapr". For more information enable debug logging.
INFO 01-08 00:45:21,130 - Kitchen - Logging is at level : Detailed logging
INFO 01-08 00:45:21,131 - Kitchen - Start of run.
INFO 01-08 00:45:21,240 - Standard_ETL - Start of job execution
INFO 01-08 00:45:21,242 - Standard_ETL - exec(0, 0, START.0)
INFO 01-08 00:45:21,246 - START - StartinNot enough storage is available to pro
cess this command.
In another example:
$ A_ETLRuntimes.sh k7hf-8y75
INFO 01-08 00:45:31,837 - Kitchen - Processing ended after 10 seconds.
INFO 01-08 01:45:31,532 - Kitchen - Processing ended after 10 seconds.
INFO 01-08 02:45:35,323 - Kitchen - Processing ended after 13 seconds.
INFO 01-08 03:45:31,021 - Kitchen - Processing ended after 10 seconds.
INFO 01-08 04:45:33,491 - Kitchen - Processing ended after 9 seconds.
INFO 01-08 05:45:35,705 - Kitchen - Processing ended after 10 seconds.
INFO 01-08 06:45:30,327 - Kitchen - Processing ended after 9 seconds.
INFO 01-08 07:45:30,401 - Kitchen - Processing ended after 9 seconds.
INFO 01-08 08:45:30,755 - Kitchen - Processing ended after 9 seconds.
INFO 01-08 09:45:30,992 - Kitchen - Processing ended after 11 seconds.
INFO 01-08 10:45:30,064 - Kitchen - Processing ended after 10 seconds.
INFO 01-08 11:45:32,956 - Kitchen - Processing ended after 13 seconds.
INFO 01-08 12:45:29,006 - Kitchen - Processing ended after 10 seconds.
INFO 01-08 13:45:31,072 - Kitchen - Processing ended after 12 seconds.
INFO 01-08 14:45:34,327 - Kitchen - Processing ended after 13 seconds.
INFO 01-08 15:45:32,042 - Kitchen - Processing ended after 13 seconds.
INFO 01-08 16:45:36,774 - Kitchen - Processing ended after 15 seconds.
INFO 01-08 17:45:30,050 - Kitchen - Processing ended after 11 seconds.
INFO 01-08 18:45:30,543 - Kitchen - Processing ended after 10 seconds.
INFO 01-08 19:45:30,866 - Kitchen - Processing ended after 11 seconds.
INFO 01-08 20:45:30,196 - Kitchen - Processing ended after 9 seconds.
INFO 01-08 21:45:27,508 - Kitchen - Processing ended after 9 seconds.
INFO 01-08 22:45:27,525 - Kitchen - Processing ended after 9 seconds.
INFO 01-08 23:45:30,555 - Kitchen - Processing ended after 10 seconds.
INFO 02-08 00:45:31,668 - Kitchen - Processing ended after 9 seconds.
INFO 02-08 01:47:31,540 - Kitchen - Processing ended after 2 minutes and 14 sec
onds (134 seconds total).
INFO 02-08 02:45:29,161 - Kitchen - Processing ended after 10 seconds.
INFO 02-08 03:45:28,886 - Kitchen - Processing ended after 10 seconds.
INFO 02-08 04:45:32,704 - Kitchen - Processing ended after 9 seconds.
INFO 02-08 05:45:35,135 - Kitchen - Processing ended after 10 seconds.
INFO 02-08 06:45:30,211 - Kitchen - Processing ended after 10 seconds.
INFO 02-08 07:45:28,853 - Kitchen - Processing ended after 10 seconds.
INFO 02-08 08:45:28,631 - Kitchen - Processing ended after 9 seconds.
INFO 02-08 09:45:32,719 - Kitchen - Processing ended after 14 seconds.
INFO 02-08 10:45:26,688 - Kitchen - Processing ended after 10 seconds.
INFO 02-08 11:45:28,690 - Kitchen - Processing ended after 10 seconds.
INFO 02-08 12:45:28,745 - Kitchen - Processing ended after 10 seconds.
INFO 02-08 13:45:28,885 - Kitchen - Processing ended after 10 seconds.
INFO 02-08 14:45:27,470 - Kitchen - Processing ended after 10 seconds.
The bash script has the same Hadoop error as above. The grep/find doesn't seem to be working, but unclear to me as why.
Okay - I'll take a look and debug using your sample logs. I'm pretty booked for the next couple of weeks, but will work on it as catch can.
@tomschenkjr - can you double check that those sample files are still available on FileTea? I get a blank page when following the URL you posted.
@jefw -- Ok, adjusted the link. That should work.
@tomschenkjr - Retrieved in good order - thanks.
Okay - fixed in my branch. I was using copy to get file content into the find command, but this was stopping after the first file. Switched to type instead. I have a dim recollection that type is limited to files < 2GB but couldn't quickly verify this. The .bat now outputs the same as the .sh when tested with k7hf-8y75. Please re-test and advise.
I am not sure what you mean about the Hadoop errors. These appears in the log files themselves, and should be excluded by grep/find.
Thanks, I'll test to confirm.
For the Hadoop ... when I was running the command, it was displaying a Hadoop error that was contained in the logs -- which just happened to be the first line.
Tom Schenk Jr.
Chief Data Officer
Department of Innovation and Technology
City of Chicago
(312) 744-2770
tom.schenk@cityofchicago.org
data.cityofchicago.org
From: Jef Waltman notifications@github.com Sent: Tuesday, December 8, 2015 1:37 PM To: Chicago/open-data-etl-utility-kit Cc: Schenk, Tom Subject: Re: [open-data-etl-utility-kit] Issue 3 (#30)
Okay - fixed in my branch. I was using copy to get file content into the find command, but this was stopping after the first file. Switched to type instead. I have a dim recollection that type is limited to files < 2GB but couldn't quickly verify this. The .bat now outputs the same as the .sh when tested with k7hf-8y75. Please re-test and advise.
I am not sure what you mean about the Hadoop errors. These appears in the log files themselves, and should be excluded by grep/find.
Reply to this email directly or view it on GitHubhttps://github.com/Chicago/open-data-etl-utility-kit/pull/30#issuecomment-162991947.
Issue 3 by jefw · Pull Request #30 · Chicago/open-data-etl ... Per Issue 3, created A_DatasetLogs.bat, A_ETL_Runtimes.bat, A_RunETL.bat, A_TodayLogs.bat to mimic the functionality of the corresponding shell scripts. Care has been ... Read more...https://github.com/Chicago/open-data-etl-utility-kit/pull/30#issuecomment-162991947
This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), you are hereby notified that any dissemination, distribution, printing or copying of this e-mail, and any attachment thereto, is strictly prohibited. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the original and any copy of any e-mail and printout thereof.
Succes :+1:
Thank you.
Awesome!
Per Issue 3, created A_DatasetLogs.bat, A_ETL_Runtimes.bat, A_RunETL.bat, A_TodayLogs.bat to mimic the functionality of the corresponding shell scripts. Care has been taken to avoid dependencies, other than a modern windows command interpreter. However:
Without a set of example log files, testing has been very limited, please construct your test plan accordingly.