Implement an initial Derby schema for recording log parse events on the server

GoogleCodeExporter commented 9 years ago

This is the first step towards completing Issue 165. This task implements a key 
server-side GWT feature -- tracking user parse log requests and recording the 
associated log/re-exp information.

This task can be solved with embedded Derby with the following steps:

(1) Creating a Derby database with the schema in the attached image -- field 
types are self-descriptive (e.g., timestamp is TIMESTAMP, and text is CLOB), 
see this page for more:
http://db.apache.org/derby/manuals/reference/sqlj123.html#HDRSII-SQLJ-21305

(2) Automating (1) by writing Java code that creates the db from scratch at a 
desired file location.

(3) Including embedded Derby as part of the SynopticGWT project -- i.e., adding 
a jar to lib/, adding a dependency in the Eclipse project. Writing a unit test 
for the availability of the library. Testing that the build.xml script still 
works (it should continue to work).

(4) Writing code to 

(4.a) Read and store Derby-specific options in AppConfiguration.java -- e.g., 
db filename.

(4.b) Create a Derby instance based on the args from (4.a), or set a flag to 
indicate that it is not available. Note that, the db should not be initialized 
(i.e., deleted and re-created) unless the file does not exist. If it does 
exist, it should be opened as a Derby DB, and on error the server should raise 
an exception.

(4.c) Capture user data on log uploads/parse requests into the initialized 
Derby instance.

(4.d) Make sure that the Derby instance is closed cleanly when the server 
process terminates. The db cannot be left in a corrupted state if the server 
happens to crash.

Original issue reported on code.google.com by bestchai on 6 Feb 2012 at 11:00

Blocking: #165

Attachments:

initial-synoptic-db-schema.jpg

GoogleCodeExporter commented 9 years ago

Original comment by bestchai on 6 Feb 2012 at 11:01

GoogleCodeExporter commented 9 years ago

Should the data type for the hash field and result field? From looking online, 
a hash should be a VARCHAR(n), but I'm not sure what the value of n is.

Original comment by kevin.a....@gmail.com on 11 Feb 2012 at 9:21

GoogleCodeExporter commented 9 years ago

There are the SQL statements I'm executing in my Java code for (2). 

CREATE TABLE Visitor (vid INT PRIMARY KEY, IP INT, timestamp TIMESTAMP);
CREATE TABLE UploadedLog (logid INT PRIMARY KEY, text CLOB, hash VARCHAR(255));
CREATE TABLE ReExp (reid INT PRIMARY KEY, text CLOB, hash VARCHAR(255));
CREATE TABLE LogReExp (parseid INT PRIMARY KEY, reid INT, logid INT);
CREATE TABLE SplitReExp (parseid INT PRIMARY KEY, reid INT, logid INT);
CREATE TABLE PartitionReExp (parseid INT PRIMARY KEY, reid INT, logid INT);
CREATE TABLE ParseLogAction (vid INT PRIMARY KEY, timestamp TIMESTAMP, parseid 
INT, result VARCHAR(255));

Original comment by kevin.a....@gmail.com on 12 Feb 2012 at 3:58

GoogleCodeExporter commented 9 years ago

The hash field should be a fixed length field, like CHAR(32).

For now, assume that this will store the 16 byte (32 char) MD5 digest of the 
corresponding text. Here are some relevant resources:
http://docs.oracle.com/javase/6/docs/api/java/security/MessageDigest.html
http://stackoverflow.com/questions/415953/generate-md5-hash-in-java

Original comment by bestchai on 15 Feb 2012 at 12:42

GoogleCodeExporter commented 9 years ago

A few notes:

- Add DerbyDB unit tests to test/synopticgwt.server. These should (1) create a 
new db and tables, (2) open an existing table, (3) write to a table, and read 
back from a table and check that the read data is the same as the written data.

- Extend AppConfigurationTests.java to check DerbyDB fields on null context

- Refactor DerbyDB to have static final strings for create statements, and a 
single method that does actual table creation.

- AppConfiguration class should (1) have the constructor open a DB if the path 
arg points to a non-empty dir, (2) have the constructor create the DB from 
scratch if the path arg points to an empty dir, (3) disable derby functionality 
if the path arg is null or the path dir doesn't exist, (4) have a finalize 
method that shuts down the derby db. Make sure to print lots of diagnostic 
messages for all of these cases, so that it is easy to debug. And document the 
args to start the server with a derby db instance.

- DerbyDB should have methods for writing/reading the tables. Try to find OO 
interfaces for this, instead of using raw sql.

Original comment by bestchai on 15 Feb 2012 at 9:05

GoogleCodeExporter commented 9 years ago

Additional details:

What exactly is the result field of the ParseLogAction supposed to hold?

1. The error string, if there was an exception or error displayed to
the user in the GWT interface.

2. Properties of the result, in some structured/well-defined format.
Some things that I think would be interesting to include:
- Number of nodes/edges in the returned graph.
- Number of traces parsed from the log
- Number of unique event types parsed out
- Number of invariants parsed from the log, for each kind of invariant type
- Time in seconds that it took to (1) mine the invariants, and (2) to
derive the final model

This could be in some format like this:
edges:10,nodes:50,traces:1000,etypes:100,afby:10,nfby:10:ap:5,miningtime:10,syno
ptictime:100

Original comment by kevin.a....@gmail.com on 23 Feb 2012 at 8:01

GoogleCodeExporter commented 9 years ago

Solution in revision 844f45e82d0a, please code review.

Unimplemented notes:
- The time to derive model (synoptictime) isn't a property in the result. This 
would require me to write everything to Derby after the final model, but 
currently everything is written after parsing the log.
- Creating a database from an empty dir doesn't work. It throws an error. I was 
only able to create database when pointing to a new dir.
- Current solution assumes that parsing occurred with no exceptions. I haven't 
made result write an error string to Derby.

Other notes:
- When the database is shutdown, it will throw an exception. This is an 
expected result though.
- For log lines stored in UploadedLog, if you use the command-line ij to select 
a log, the entire log line isn't viewable. It only displays about 150 
characters. This confused me for a while. The entire log is actually stored 
though, I verified by printing out a stored log through DerbyDB.
- The DerbyDB class methods are currently made for just this issue. Other 
useful methods needed in the future will probably need to be written.

Original comment by kevin.a....@gmail.com on 27 Feb 2012 at 12:01

GoogleCodeExporter commented 9 years ago

I committed some fixes, mostly having to do with SQL exceptions. When dealing 
with exceptions, you never want to silently ignore them -- always percolate 
them up to a level where they are supposed to be handled. In the case of the 
DerbyDB library, the library should never handle exceptions (except maybe the 
shutdown() case, though we need to distinguish between clean/failed shutdowns).

I also updated AppConfiguration and the server side tests. If no derbyDBDir is 
set, Derby support should be disabled and all the code should continue to work 
as expected. Think of Derby support as a bonus feature. The current code fails 
with "INFO: null viddd" when derby support is disabled. Please fix this and 
make the code work without relying on derby.

The most important comment about the current code concerns DerbyDB.java. The db 
interface it provides is very broad. This means that the DerbyDB class is small 
and simple, but the client code in SynopticGWT.java is hairy and full of SQL 
statements. I think that DerbyDB should hide all SQL logic -- the client code 
(i.e., SynopticGWT.java) should never have to worry about constructing SQL 
statements. The client should just call a method with the right args, and 
DerbyDB should perform all the necessary (perhaps multiple) DB operations. To 
start, I think you should migrate all SQL into DerbyDB, and then narrow the 
interface to dead-simple methods that insert/update specific tables/relations 
in the database. Another change you might want to implement is to create a 
class per table. This would allow you to refactor DerbyDB.java so that it keeps 
track of a set of tables, but the tables themselves know what operations they 
support, how to create the underlying table in the DB, etc. All of this 
refactoring will also simplify a future migration to an ORM, if we every head 
this way. An ORM abstracts away DB functionality using OO; encapsulating all 
the DB logic in DerbyDB.java and table-specific objects is a first step towards 
this.

More line by line comments in revision d0ab73026f2c, and revision 9e73530c8454.

Original comment by bestchai on 27 Feb 2012 at 7:28

GoogleCodeExporter commented 9 years ago

Solution in revision 5aa4dc782535, please review.

You left a comment in revision 9e73530c8454 about a way to distinguish a clean 
shutdown and failed shutdown. There doesn't seem to be a way. According to this 
http://www.coderanch.com/t/471003/open-source/Derby-Closing-Db, throwing a 
SQLException is standard.

Original comment by kevin.a....@gmail.com on 7 Mar 2012 at 3:25

GoogleCodeExporter commented 9 years ago

I had to significantly refactor your code, but the overall implementation is 
sound. Take a look at my changes in revision 8287a52e9bd7 to understand what 
I've done.

Merged into default/issue fixed with revision 90f65abb54d3.

Original comment by bestchai on 12 Mar 2012 at 8:52

Changed state: Fixed

chubbymaggie / synoptic

Implement an initial Derby schema for recording log parse events on the server #216