apprenticelearner / AL_Train

A repository for the CTAT HTML based training harness for Apprentice Learner agents.
MIT License
5 stars 5 forks source link

Problem Name, and Dataset Level columns are blank in log output #4

Closed cmaclell closed 4 years ago

cmaclell commented 5 years ago

Occasionally columns in the output log are blank

eharpste commented 5 years ago

We have a new instance of this problem that suggests its related to left over actions after a correct done button press. Can we confirm this is a reliable piece of the issue?

DannyWeitekamp commented 5 years ago

This is not likely, an issue with AL_autorun.js since all of the logging parameters are passed to CTAT through query strings in the URL. Its more likely that there was an error parsing a particular message or the message wasn't sent at all by CTAT.

annarafferty commented 5 years ago

Here's an example log with the issue - it looks like it might occur after a done press, although I'm not sure if that's the issue: outer_loop_test_bktLog-2019-07-17-16_01_01.txt

DannyWeitekamp commented 5 years ago

So I've been able to replicate this issue on Windows, and not on Linux. It seems that the issue has to do with how quickly each operating system handles logging requests. Windows is taking its sweet time on a few things: 1) Resolving the localhost domain name (can be fixed by changing to 127.0.0.1) 2) Perhaps parsing the xml and formatting the data 3) Perhaps writing the data to disk

I've written a unit-test which flexes this issue in the /unittests folder. The only real solution to this is to write a more significant logger which can handle multithreaded asynchronous requests. If someone motivated wants to move over to Flask or something like that, I won't be able to take care of this until after the CHI deadline. If people are desperate, I imagine running a Linux server on AWS with a selenium browser would fix the issue. However, as the code gets faster I imagine Linux will also run into issues with the logging requests coming in too fast.

Additionally, I should mention that this issue is not just a matter of missing data in some of the rows. If you carefully count the number of recorded transactions you will see that some events simply were not recorded. So the issue cannot be solved by just removing the offending rows.

eharpste commented 5 years ago

Is it possible that some of the platform dependent delay comes from re-opening and disposing of the file writer for every log request? Some cursory searching (admittedly really old: https://stackoverflow.com/questions/1842798/python-performance-on-windows) seems to suggest there are platform dependent speed issues with file I/O operations. If we just maintained a single writer object the whole time would the problem just go away?

DannyWeitekamp commented 5 years ago

Could be, I recreate the file handle every time so its worth a shot.

DannyWeitekamp commented 5 years ago

^ tried this today. Didn’t fix it.

eharpste commented 4 years ago

@DannyWeitekamp is pretty sure this is fixed.