Develop the query command

darozak commented 5 years ago

We need to develop the FORTH based query command so that players and their drones can learn about events. Please see Chapter 7 of the Game Design Manual for details.

crodnun commented 5 years ago

After reading Chapter 7, there are (as expected) many doubts. As usual, the best way to clarify these doubts would be to use an example, this way, I would see clearly what you have in your mind.

I have tried to follow this logic:

But I think that there are some typos there, tables names sometimes do not match the ones defined above.

Anyways, let me know if I am right, but I understood so far is that we want to build SQL queries using FORTH query command, isn't it?

The most difficult part for me is to mentally connect the 5 parameters supplied to the query command (some of them are just internal memory stuff) with the real world.

I can also envisage an issue with the results array storage. There, you mention that based on the size we allocate N bytes, but to decode this array of bytes we need to know what type of storage we are using for each field. Normally this is solved using TAG-LENGTH-VALUE format, tag to identify the field ID, length to tell us how many bytes and value, just the bytes you mention there.

Another doubt I have is regarding the "internal database" itself, I do not understand if this is part of our current tables, if it is a logical database, something else? It seems that we are building SQL statements to query some tables, but I can not see what tables would be used here.

No more for the moment, as I stated, I think that the best way to get a clear understanding of this is just to give me an example of how it would work in real life, just the simplest one, a user sends by email this, then we do that, etc,..

Cheers

crodnun commented 5 years ago

I have read the latest query command specifications and seems that I am understanding a bit more this part, slowly but I think that I can see what you want here, it will be easier once we program it with a simple example.

Here you have my conclusions, let me know which are correct, which are not:

The goal of the query command is just to retrieve part of the information we have in DB. It is just an intermediate layer to let users run SQL queries from FORTH commands.
Once we retrieve the results, information is persisted inside the VM to be queried later using FORTH.
We define 3 tables that allow us to build the final SQL query: "SELECT X,Y from A join B ...", once we have it and unless the administrator changes it, we have a kind of "static" query string that we execute against mysql each time the user invokes it.
If the administrator modifies something inside the query-related tables, we need to re-calculate this query script.
Both the alias and the results are stored inside the VM memory.

Now the doubts and questions:

Playing with internal memory allocation is a risk, I mean, that any mistake at the user's side can lead to a corrupted VM. Do we know how to control that the results will really fit inside the allocated space? If we allow the user to deal with pointers directly, we are putting in risk his VM. We can store strings without trouble (using the proposed delimiter), but I am really concerned with the memory allocation.
How the user will see the results stored in the VM? I guess that just using the results memory pointer, but we should also describe this step somewhere. At some point, we will need to read this structured memory from the VM and send it to the user by email, isn't it? We will need to use a different command or wrap the FORTH command somehow.
I think that the current DB model (the one we have deployed at advolition.com) is not up to date. Do we take the model present at the latest design document as the official one?

darozak commented 5 years ago

Yes. I think you are understanding the objectives now. Here are my responses to the specific points you made above:

The goal of the query command is just to retrieve part of the information we have in DB. It is just an intermediate layer to let users run SQL queries from FORTH commands. [Correct]

Once we retrieve the results, information is persisted inside the VM to be queried later using FORTH. [Exactly]

We define 3 tables that allow us to build the final SQL query: "SELECT X,Y from A join B ...", once we have it and unless the administrator changes it, we have a kind of "static" query string that we execute against mysql each time the user invokes it. [Only the FROM script remains static. The user essentially defines the SELECT clause and other SQL clauses with each query via the Search String.]

If the administrator modifies something inside the query-related tables, we need to re-calculate this query script. [The only thing that is modified based on changes to the query-related tables is the FROM clause. All other query clauses are defined by the user in the Search String.]

Both the alias and the results are stored inside the VM memory. [Only the user-supplied Search String and DB-supplied Results Array are stored in the VM]

I also understand your concerns about the user accidentally overwriting portions of the VM. However, this is an inherent risk in any FORTH programming because the FORTH gives the programmer free and unrestricted access to machine memory with the fetch (@) and store (!) commands. We will need to mitigate this risk by (1) encouraging beginners to access the database via predefined FORTH commands which carefully control memory access, (2) allowing users to abort any corrupted code, and (3) allowing users to reinitialize their drones if the VM is corrupted.

The user will use FORTH routines to access DB results stored in the VM.

Yes. The DB should be brought in alignment with the architecture as defined in the Design Manual. Also note that if we decide to implement the query routine as it is currently defined, we will need to rename all SQL tables and fields so that they begin with an underscore (_). This underscore will prevent users from directly accessing the SQL tables and queries via their Search String.

crodnun commented 5 years ago

Hi Dave, what about pushing this part with a quick example as we did with protocols, events, etc,.. just to get used to the ideas you have for this feature. My suggestion here would be to define an initial query, just to let me implement it at the same time I grasp the idea. What about adding a kind of "reset" command to clean user VM and start from scratch?? If needed we can deal with it in a different issue. This new command would clean VM, user will lose any previous words defined. What do you think?

darozak commented 5 years ago

Good idea. I'll write up a scenario or two that show how the query command will function. I'll also create an issue for creating a reset command.

darozak commented 5 years ago

I've tried to simplify the query logic and make it more resistant to hacking. I was worried about the open ended approach that allowed players to write their own SQL queries. That strategy introduces too many possibilities that we need to control against.

You can find a description of the new query mechanism, along with a number of examples here: https://github.com/Darwin-River/Ex-Machinis/wiki/Database-Searches/7be874aa54fe1d0ede43cc3699ca56c422c3be5e

Please review and let me know if you have any concerns with the revised approach.

Also, please commit the latest version of the code. It doesn't look like the repository has been updated since May.

Thanks!!

crodnun commented 5 years ago

Hi Dave, after reading the new query mechanism, it seems much simpler now, no doubt (at least at first glance, it is really easier to understand than the previous one). I think that we have there all I need to implement it. Of course, new doubts will appear on the fly but so far, it seems OK for me. My only concern is regarding the results storage, 16 bits for integers could not be enough if we need to deal with big numbers at some point. Anyways, I think that we can push it with this, I will update you with the progress.

crodnun commented 5 years ago

One question here, with this new query command. I think that we can remove the old tables: query_fields, table_joins, etc,... am I correct? This feature now will only require a single table, the queries table, isn't it?

darozak commented 5 years ago

Hi Carlos. You're correct that the new query routine only requires the queries table. We can get rid of query_fields and table_joins.

FORTH is set up to handle 16-bit numbers, however it's also possible to work with 32 bit numbers if necessary. I agree that we should allow queries to return 32-bit numbers to the VM as well.

Dave

crodnun commented 4 years ago

I could push this a bit today:

DB is updated
Tables at the server contain a couple of queries now based on the requirements provided.
Added 'query' word as an extension to our VMs.
Basic query support is in place (with this I mean that you can send and the game replies, just a dummy message I detail below later on inside this post).

Some minor fixes were required to accommodate the requirements:

The size of the script field was too small, defined as varchar(512) in real life.
Example queries were not accurate, just in case you want to amend the doc, these are the valid ones I put inside the DB (backup done):

mysql> select script from queries;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| script                                                                                                                                                                                                                                                                                 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| SELECT observations.id, observations.time FROM observations INNER JOIN events ON observations.event = events.id WHERE observations.drone = [drone_id] AND observations.time BETWEEN CURRENT_TIMESTAMP AND CURRENT_TIMESTAMP - INTERVAL [value_1] MINUTE ORDER BY events.timestamp DESC |
| SELECT id FROM objects WHERE objects.name='[string_1]'                                                                                                                                                                                                                                 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

So far, so good, but I have found a couple of stones on our road, I describe them here to discuss the better solution together:

The query word is already defined inside FORTH and the interpreter adds an annoying text at the output, you can test the same just sending an email to your drones, i.e.

1 2 3 query

Sending it you will receive the following message at the reply:

Command not supported yet **query redefined**

(the last part inside ** is added by FORTH when a word is redefined, the first part is mine, we are under construction really ;-)).

A workaround would be to not use query and use another name for this, or just try to filter this extra text that FORTH puts there before sending the email to users.

When adding new extensions, we need to reset every VM (for everybody), seems that they are not picked on the fly, so keep this in mind because every time we need to add an extension like this new 'query', the VM must be re-generated for everybody and all the words and information stored there will be lost. Hope it is not a big deal as far as we never need to create new ones once users are playing but maybe we should create a kind of 'all-purpose word extension' that could help us in the future, this way, we will not need to erase any VM, everybody will have all the extensions they need since the beginning and we have this kind of wildcar/'all-purpose word' in we need to add more things in the future.

Hope this helps, let me know your thoughts regarding the same, I will continue pushing it and posting here.

darozak commented 4 years ago

Nice progress! I'm looking forward to having a functional query command since that will really open up game play options.

It looks like the "query" command is defined in embed.fth at one location and then used in a second. It should be easy to change this to something like "ask" and in the process free up "query" for our own use.

As far as the corrected SQL query, it needs to be returning events.id and events.time rather than observations.id and observations.time.

I'm not sure what to do about the all the VM's being reset when we introduce new commands. I don't suspect we'll be doing this often but it could present a problem. I'm betting that the same reset occurs when we change embed.fth as well.

Using a general command is a good idea. We could theoretically use the same command (say "invoke") to cover all external actions. In other words, the first value on the stack would direct the game engine to run a protocol, a query, or any other specialized function. Additional values would be pulled from the stack to support the specific function.

For example, "1 invoke" would call upon our existing protocol routine and "2 invoke" would trigger the query routing. This could be cleaned up in embed.fth or by the user by defining more meaningful words such as ": perform 1 invoke ;" and ": query 2 invoke ;" We would be free to add additional functionality without disrupting game play by simply branching the C code based on the first number that "invoke" pulls from the stack.

At their convenience players can "" their drones to rebuild the VM using the latest version of the embed.fth... assuming that is how it works.

I kind of like this open-ended approach. What do you think?

crodnun commented 4 years ago

Yes Dave, the generic command is what I had in mind, evoke or whatever you consider, this would give us a powerful tool to add new extensions on the fly, users could then just define their own words to wrap this generic extension. Regarding the query, not sure how to that, I do not remember how we can generate the base image/VM we use now as start vm. I think that somewhere Howard had instructions to compile a new one, but we dhoild find where. Am I correct? If you agree, we can deal with both issues separately, the most important one would be to free the query command, but so far I could go ahead ignoring the annoying messagd added by forth.

darozak commented 4 years ago

Hi Carlos. I suggest we use the existing "perform" routine as the hook for all spacecraft commands.

When someone executes "perform", the Game Engine will pull the first number off the stack and determine whether it indicates that the Game Engine should perform a special command. This would be a command that is not addressed in either the protocols or queries tables but that we might add in the future.

If the number does not belong to a special command, the Game Engine will search protocols.id for an entry that matches the number it pulled from the stack.

If it fails to find the value in protocols.id, it will search for the number in queries.id.

If it can't find the number there, it will conclude that the command doesn't exist.

In other words, every time the "perform" command is called the Game Engine engine will follow a simple branching structure:

If the number on the stack corresponds to a specially-defined command, execute that command and then end the routine. (We haven't defined any of these commands yet.)
Else, if the number on the stack matches a protocol ID, implement that protocol and end the routine.
Else, if the number on the stack matches a query ID, implement that query and end the routine. 4, Else, the command is undefined. End the routine without doing anything.

In practice, we'll assign IDs in one number range (say 1001 - 2000) to protocols and IDs in another number range (say 2001 -3000) to queries. Special commands may occupy the value range from 1-1000.

Dave

crodnun commented 4 years ago

OK, we will reuse the perform command for the same, I have created a separated issue for this, just to keep focused on query command here.

darozak commented 4 years ago

Sounds good! But queries will be invoked via the existing perform command. Correct?

crodnun commented 4 years ago

queries? do you mean to replace query by perform? Is this what you mean, to use it everywhere including the "query" one? I was not considering it for this yet, maybe confused by your post regarding modifying the VM to rename the existing query command. But not problem doing it this way.

Could you clarify this a bit? Right now we have the following extensions at VM:

perform query report

Do you want to change them, join the first 2? Thanks

darozak commented 4 years ago

Hi Carlos. I was just trying to understand your plans for the query command. It's fine to keep it as "query" rather than combining it with "perform." Lets keep things as they are: "query" for queries and "perform" for protocols.

crodnun commented 4 years ago

Hi Dave, a bit stalled today, I share here my issues to think on the same together. The main issue I have found when dealing with a query is the write and read from VM memory using extensions.

We have in place the logic to parse the query command, search DB, extract the extra parameters, etc,... but I can not see it from the point of view of a user just sending an email, that is our input.

How the user will send an address to the extension? Should they run this:

3232 x x x query being 3232 the address? a bit confused with the FORTH syntax mixed with C extensions, the purpose of the extensions is to execute some logic at C side when we receive a special command, but the extensions do not understand syntax like this: ``` : earth $“ Earth” ; ( create a named search string ) earth 2 results_array 2 query ( retrieves the object ID for Earth ) results_array @ . ( prints the ID of the most recent event ) ``` Could you detail a bit more, from the point of view of the user (using the email interface), how you expect it should work? I can not see how it could work this way ... in the theory everything was clear, but now when we go deeper into the codes, a bit confused with your examples, ... do we receive just numbers or FORTH commands? Thanks

darozak commented 4 years ago

Hi Carlos. I'll try to explain this a little better.

The query routine only ever pulls numbers off the stack. However, depending on how that parameter is referenced in the queries.script field ([value_1] or [string_1]), it will either treat the value as an integer number or a pointer to a section of the VM that holds a counted string.

For example, [value_1] will be replaced with an integer number that directly corresponds to the value pulled off the stack.

However, it the script references [string_1] instead, that same value will be treated as pointer to a specific memory location in the VM. At this location the Game Engine will find the start of a string, whose first byte holds a value that represents the length of the string ($00-$FF = 0 - 256). In this case, the Game Engine will replace [string_1] with the string that is stored in thiis memory location.

What may be confusing you is the fact that, in the examples I provided, the user begins by defining two FORTH words that point to the locations of the search string and results array in VM.

": earth $" Earth" ;" adds the word "earth" to the drone's dictionary. Whenever "earth" is called in a FORTH script, it places a number (say 3234) on the stack, which points to the string "Earth" in VM. The word "results_array" was also created as a variable in the first of the two examples. This likewise means that when "results_array" is called in a FORTH script, it places a value on the stack (such as 4355), which points to the location in memory that holds the results_array array. Therefore, the following line of code "earth 2 results_array 2 query" ends up placing 3234, 2, 4355, and 2 on the stack (from bottom to top) for use by the query routine. The first 2 (on the top of the stack) indicates the quiries.id and associated queries.script SQL script that will be performed. The second value on the stack, 4355, give the location where the query routine will start depositing the results from the SQL query. The next 2 sets the maximum length of the deposited results to two bytes. This is because the returned ObjectID for Earth should only be two bytes long. Finally, 3234 points to the query string that will be used to replace [string_1] in queries.script. To retrieve this string, the Game Engine will first go to 3234 in VM where it will find the length of the string in bytes (Earth = 5). Then it will pull each of the letters for the string from memory locations 3235-3239 (E - a - r - t - h). This will be used to replace [string_1] to make a complete SQL query that can be executed via the C++ code.

Does this help? I really need to add a section to the player's manual that details how FORTH handles variables, arrays, and strings. It's not intuitive at first, but once you understand how it works, everything makes sense.

crodnun commented 4 years ago

Hi Dave, pushing this I have a doubt regarding the order of the following tags we can find in a query:

Query Variables
When writing a script for the queries.script field, the system administrator will use SQL syntax to indicate how data will be selected, sorted, and reported back to the drone. In addition to naming specific tables, fields, joins, and search terms in the query script, the system administrator can include variables, which will be replaced with player or Game Engine supplied values when the query is actually performed, The variables that the system administrator will be able to use in the query script are as follows:

[value_1] will be replaced by the first query specific parameter that is pulled from the stack and rendered as a decimal integer. [value_2], [value_3], and [value_4] will do the same with the second, third, and fourth query-specific parameters that are pulled from the stack.
[string_1] will be replaced by the string that is pointed to by the first query-specific parameter on the stack. The ASCII string will be pulled from the drone’s memory location pointed to by the user supplied address. [string_2], [string_3], and [string_4] will do the same for the second, third, and fourth user-supplied values found on the stack.
[drone_id] will be replaced by the integer ID of the drone that is performing the query.
[time_delay] will be replaced by the distance in light minutes that the drone is from Earth.

Mixing string_x with value_x could be a mess, we should force some kind of order here, otherwise it could be complicated to determine what is the first parameter at stack, if we use for example:

select [value_1], [string_1] from X;

What is the type of the first stack value: int or string? should we process them in order? right now I just try to replace tags by values, but found this issue, if we mix both types,

I assume that value_2 should not be defined if value_1 is not defined, isn't it?

Could you clarify this part a bit, thanks

darozak commented 4 years ago

Hi Carlos.

We shouldn't need to impose any order on where numbers and string pointers appear on the stack.

Only numbers can be stored on the stack. So, the inputs to the query routine will always be numbers. However, it's up to the query routine do determine how to interpret that number.

If the SQL script calls for a number (i.e. [value_1] or [value_2]), then query script will simply insert the number it pulled from the stack in that location.

However, if the SQL script calls for a string (i.e. [string_1] or [string_2]) then the query script will take the same number it used in the example above and treat it as a pointer to a VM memory location that contains a string.

The query routine will interpret the first byte that is pointed to by that memory location as an 8 bit number (0-255), which indicates the length of the string. It will then retrieve the next (0-255) bytes from memory and treat each as a character in the string that it will use to replace [string_1] in the SQL script.

If the user-provided value accidentally points to a section of memory that does not contain a string, the query will still convert 0-255 bytes from that location into a character string and insert it in the query. The end result will likely be a valid query. However, it will fail to return any results. This is because the garbage string is unlikely to appear anywhere in the game's database.

The one thing the query command will need to do when retrieving a string of characters from memory in this fashion is ensure that no illegal characters are present that would mess up the SQL query. For example,, a single quote would not be permissible since this is what is used to bound strings in an SQL statement.

In theory, the same SQL script could contain both "[value_1]" and "[string_1]". In this case the query routine would replace [value_1] with the first number it pulled from the stack. Then, in the same SQL script, the query routine would replace [string_1] with the string (real or garbage) that is pointed to by that number. The resulting query should work fine. It might not produce any results.

With respect to your second questions, [value_2] will only be defined if the query routine is instructed to pull two or more values from the stack (i.e. queries.parameters >= 2). However, it would be perfectly acceptable for the query routine to pull two values from the stack, and the SQL script to only use the second value ([value_2]). It just wouldn't make much sense.

Also, I would suggest that if the query only pulls one value from the stack (queries.parameters = 1) and the SQL script references [value_2] or [string_2], these tags should be replaced by a zero or a blank string, respectively.

Dave

crodnun commented 4 years ago

Hi Dave, let me detail a bit more my doubt:

I thought that we could use both type of tags at the same time. This would give us more flexibity and also more combinations.

Sharing the same stack value, we could only have 4 combinations.

I understand perfectly that everything in the stack is stored as numbers, no problem there. My concern is how to match these numbers with the query tags to replace them (by a number or by a string found using the address = number).

Let me show this with an example:

select [value_1] from [string_1] where [string_2] and [value_2]....;

And parameters configured are 4 at DB.

My question is if we should assume that the tags are in order, I mean:

[value_1] is the first value present at stack [string_1] is the second [string_2] is the third [value_2] is the fourth

This is what I mean, we have a list of parameters in the stack and they were put there in order, so we need to retrieve them in order, and as you see the value-2 can be the third or any other position, most of the times the number we put in the tag has nothing to to with the real order.

My question I think is if the tags are replaced in order using stack content.

Thanks

darozak commented 4 years ago

Ok. Sorry. I think I understand your question now. At least I hope I do.

Although possible, you will probably never have a script that references [value_1] and [string_1] at the same time. Nor are you likely to find both [value_2] and [string_2] in the same script. A typical script might look something like this:

SELECT objects.id FROM objects WHERE objects.id=[value_1] or objects.name='[string_2]'

and not:

SELECT objects.id FROM objects WHERE objects.id=[value_1] or objects.name='[string_1]' (where both tags have the "_1" modifier)

However, even if the same script referenced both [value_1] and [string_1], you'd still only be pulling a single number, indicated by "_1," from the stack. The tag, [value_1], tells the query routine to "insert the value of the first number on the stack here." Similarly, the tag, [string_1], tells the query routine to "insert the string that is pointed to by the first number on the stack here." So each number on the stack can be used both as a simple value and a pointer, depending on what is called for by queries.script. The queries.description field will tell the user exactly how the query routine will use each of the values placed on the stack.

Maybe the best way to code for this is to have the routine automatically treat each number on the stack as both a value and a pointer. The routine will put the number itself into a variable for [value_1]. However, it will also use the same number as a pointer and insert the corresponding 0-255 character string into a variable for [string_1]. The query routine will then search and replace every instance of "[value_1]" in queries.script with the variable it created for [value_1] and then replace every occurrence of "[string_1]" in queries.script with the variable for [string_1]. If required, it will similarly create [value_2] and [string_2] variables from the second number that is pulled off the stack, and use these to replace the corresponding tags in queries.script.

crodnun commented 4 years ago

Initial version of this logic is in place now, will test it a couple of days more.

To debug this I have added a new feature at the engine (configurable).

We define at engine configuration 2 new parameters, basically, a flag and a path, using them we can dump the content of the VM into disk, after running a command.

#------------------------------------------------
# When enabled the FORTH_DUMP_VM
# we can dump agent's vm into disk for further study
#------------------------------------------------
FORTH_DUMP_VM = "1"
FORTH_VM_OUT_PATH="/home/forth/game-engine/tmp"

This way we can have at this directory all the 64K bytes VM currently in use, the naming of these files is as follows:

vm_agent_%d.dump   (where %d is the agent ID)

I consider that this feature is really handy, I added it to see the VM content at any time, just picking a given user's VM and loading it at websites like this:

https://hexed.it/

Find attached how it looks like our word earth:

first_word_defined

You can review the content of every byte inside the VM, where the words are defined, etc, addresses, values,... much information I have learned from.

I will test it out a couple of days more but the routines to read & write seem working, the problem is now when sending scripts with multiple lines to VM, if we run command by command it seems working.

I was able to successfully run these commands, sending each one in a different email:

<run>results_array 2 allot</run> 
<run>: earth $" Earth" ;</run>
<run>earth 2 results_array 2 query</run> 
<run>results_array @ .</run>

The latest command returns 3 that is the object ID we obtain for Earth at DB, I share here just a few logs for this example:

* 09/01/2020,07:07:15.583 [0] Executing command at VM: [earth 2 results_array 2 query]
* 09/01/2020,07:07:15.585 [0] Running query callback
* 09/01/2020,07:07:15.585 [0] NAME [Get object ID] DESCRIPTION [Gets ID for a given game object] PARAMETERS [1] SCRIPT [SELECT object_id FROM objects WHERE object_name='[string_1]'] obtained for QUERY_ID [2]
* 09/01/2020,07:07:15.585 [0] Parameter[0]=[11016] set for query ID [2]
* 09/01/2020,07:07:15.585 [0] Processing query script [SELECT object_id FROM objects WHERE object_name='[string_1]']
* 09/01/2020,07:07:15.586 [0] Replacing tag [[string_1]] at query script [SELECT object_id FROM objects WHERE object_name='[string_1]']
* 09/01/2020,07:07:15.586 [0] Tag [string_1] parameter ID [0] obtained from [1]
* 09/01/2020,07:07:15.586 [0] Tag [string_1] stack value address [11016]
* 09/01/2020,07:07:15.586 [3] Reading byte at address [11016] bytes offset, [5508] words offset of the VM (firstByte: 1), value: [05]
* 09/01/2020,07:07:15.586 [0] VM string read [Earth] len [5]
* 09/01/2020,07:07:15.586 [0] Tag [string_1] will be replaced with value [Earth]
* 09/01/2020,07:07:15.586 [0] Allocated [56] bytes (Total memory: [248])
* 09/01/2020,07:07:15.587 [0] Query script after tags replacement [SELECT object_id FROM objects WHERE object_name='Earth']
* 09/01/2020,07:07:15.587 [0] Running VM query [SELECT object_id FROM objects WHERE object_name='Earth']
* 09/01/2020,07:07:15.587 [0] [1] entries found for query [SELECT object_id FROM objects WHERE object_name='Earth'] ([1] fields per row)
* 09/01/2020,07:07:15.587 [0] Writing integer [3] into VM address [5492], bytes [03][00]

Will keep you updated when ready for testing and crashing it!!!

darozak commented 4 years ago

Hi Carlos. This is fantastic progress! And there's the added bonus of the VM dump!!! I'm glad you came up with this. It will really help to see what's happening in these machines.

Please post an example of the combined script which didn't work.

crodnun commented 4 years ago

Hi Dave, I detected what was the issue. Just my fault when copying examples from github, seems that some strange characters were added there and that was the cause of the issues at multiline scripts.

So far, so good. After my tests, I consider this feature mature enough to let you play around and detect any bug/problem. I will continue testing in parallel but feel free to jump in/ask doubts, suggest improvements.

I have added a new query to test strings but do not know how to print them at forth commands once stored at results array. Could you tell me how?

These are the scripts that I have tested so far:

<run>
variable results_query2

results_query2 200 allot

100000 200 results_query2 1 query

results_query2 @ .

results_query2 2 + @ .

results_query2 4 + @ .

results_query2 6 + @ .

</run>

<run>

: earth $" Earth";

variable results_carlos

results_carlos 200 allot

earth 200 results_carlos 2 query

results_carlos @ .

</run>

This one, is where I do not know how to print the final string obtained, the int values seem ok:

<run>

: earth $" Earth";

variable result_name

result_name 100 allot

earth 100 result_name 3 query

result_name @ .

</run>

Feel free to create new queries and let me know if you found any issue to debug it at my side. Everything is pushed to repo.

Cheers

crodnun commented 4 years ago

Hi Dave, I have detected an issue with dates and their storage. Let me know if I am correct.

At specs, you put:

Times will be placed in memory as two 16-bit values. The first value will count the number of days from 1 January 2000, The second value will count the number of even seconds (every other second) since midnight.

But doing some maths: 24 hours x 3600 seconds/hour = 86400 that does not fit in 2 bytes (16 bits word). Max value is 65535.

Using 16 bits we can only register seconds till 18 hours in a day.

Am I correct? If so, what about storing the days elapsed, then hours, minutes and seconds? just an alternative, let me know your preferences there, but probably this way we will obtain invalid or negative values for seconds (I found this issue at my tests)

crodnun commented 4 years ago

Hi Dave, what about another scenario: no room for whole data returned by the query (allocated size is smaller than bytes obtained once serialized whole info retrieved). How we should treat this condition? Is this considered an error at the query? Do we need to return just the partial info we can obtain from DB results? Any error code pushed into the stack to indicate that we only have partial info?

darozak commented 4 years ago

Hi Carlos. With respect to your string experiments, here are a couple suggestions:

You don't need to invoke the variable when using "allot" The allot command simply sets aside the specified amount of memory at the top of the dictionary so it is available to hold data. You should use it like this "variable new_array 200 allot"
You can print a counted string from memory using the count and type commands. The count command reads the first bit of the memory address (the length of the string) and returns it to the stack. The type command pulls an address and a length from the stack and prints the string stored in that memory location. Here's an example of how to print the counted string stored at myhome: "myhome count type"

darozak commented 4 years ago

With respect to the date time issue, I had already considered this and decided that we could simply report time rounded up or down to the nearest even second. That's what I meant when I wrote "(every other second)" in the quoted section of the design manual. Sorry I wasn't clearer about this. Would this approach work?

crodnun commented 4 years ago

Hi Dave, I do not store strings using the first length byte, should I do it? I just store the string in plain format. Could you confirm the same? Do we need to store strings in allocated buffers as FORTH strings? length + data?

Maybe I misunderstood this part:

Text will be imported into the VM as strings equal in length to the size allocated for the text in the SQL database.

crodnun commented 4 years ago

With respect to the date time issue, I had already considered this and decided that we could simply report time rounded up or down to the nearest even second. That's what I meant when I wrote "(every other second)" in the quoted section of the design manual. Sorry I wasn't clearer about this. Would this approach work?

Not sure to understand this part, could you clarify this a bit with an example, a bit confused

darozak commented 4 years ago

Hi Carlos. As far as the strings are concerned, we do need to import strings that are a fixed length... i.e. equal to max length set by the SQL database. Its important that we use fixed length strings so that the user knows how to locate them in the results array when they are packed in there with other data.

However, it's likely that some of the space that's been reserved for a string will not be used since the DB string is probably only a fraction of its maximum size. Therefore, it'll be useful to record the length of the actual string followed by the string itself (FORTH-style) so that when the user prints the results using the "count type" phrase, it will only display the meaningful portion of the string and not the unused white space, which is likely to follow.

darozak commented 4 years ago

With respect to the time, I'm suggesting that we take the number of seconds in a day (0-86,400) and divide it by two before saving it as a 16-bit value. This way it'll fit in the 16-bit range and, rather than tracking time in one-second-intervals, we'll be tracking it in two-second-intervals.

For an example, if an event occurs 80,000 seconds into the day we will divide it by two and record that it occurred 40,000 two-seconds into the day.

I think this method is preferable to breaking out hours, minutes, and seconds from the start, because its easier to compute the difference between two times when they are recorded in days and seconds rather than days, hour, minutes, and seconds.

crodnun commented 4 years ago

The strings management logic is in place now.

We store strings using FORTH's syntax: 1 byte for length and then the string itself.

You can test it easily now with the following command (notice that it assumes 'earth' word previously defined), just a dummy query I created at DB to get the typo of a given object (as string):

<run>
variable myvariable 30 allot
earth 100 myvariable  3 query
myvariable  count type
</run>

Query ID = 3 has the following definition in DB:

|  3 | Get object type | [S1 3 query] Returns the object type for an object with the name S1.                                                            |          1 | SELECT object_type FROM objects WHERE object_name='[string_1]';

Running it, we should obtain the following output email (with the type of object for Earth as string):

---- Position ----

At: Earth
Distance: 0.000000 light-minutes

---- Output ----

Planet

crodnun commented 4 years ago

When storing dates at VM, seconds management logic has been modified, to use seconds/2 value instead of the total value (this way we can accommodate the value in a single word, 16 bits cell).

Running this command:

<run>
variable myresults 200 allot
65535 200 myresults 1 query
myresults @ .
myresults 2 + @ .
myresults 4 + @ .
myresults 6 + @ .
</run>

We obtain:

65 7291 22092 63

And the explanation using logs is as follows:

* 12/01/2020,10:07:53.639 [0] Writing integer [65] into VM address [7584], bytes [41][00]
...
* 12/01/2020,10:07:53.640 [0] Writing date [2019-12-18 12:16:24] into VM address [7586], days elapsed [7291], seconds [44184], half_seconds [22092]

65 is the ID of the first event
7291 is the number of days elapsed
22092 is the value seconds/2 (half of the seconds elapsed since midnight)
63 is the ID of the next event

darozak commented 4 years ago

Hi Carlos, Would it be possible to switch the order of variables that are pulled off the stack by the query routine. The routine currently grabs the query ID followed by the return buffer pointer and then the size of the return buffer. It occurred to me that if we pulled the query ID, size of return buffer and then return buffer address (switching the order off the buffer address and size), we would have the option of using a counted string and the for the return buffer.

As an example, query 3 could be handled in either of the following says:

First option: Standard approach:

variable result_name
result_name 100 allot

earth result_name 100 3 query ( buffer size will be pulled before the address )
result_name @ .

Second option: Counted string approach:

variable result_name
result_name 100 allot
result_name 100 c! ( set up as a result_name as a counted string with the size stored in the first byte )

earth result_name count 3 query ( count command automatically retrieves the buffer size from the first byte and adds it to the stack. )
result_name @ .

crodnun commented 4 years ago

Order change applied but seems not working as expected at my side. Feel free to test it out.

darozak commented 4 years ago

Thanks. What part is not working?

crodnun commented 4 years ago

Thanks. What part is not working?

Seems that count pushes a 0 into the stack, and we are not able to store results later (no room there)

darozak commented 4 years ago

Ok. I'll give it a try.

darozak commented 4 years ago

Carlos, I think there are a couple reasons why the following example, which I suggested above, won't work.

variable result_name
result_name 100 allot
result_name 100 c! ( set up as a result_name as a counted string with the size stored in the first byte )

earth result_name count 3 query ( count command automatically retrieves the buffer size from the first byte and adds it to the stack. )
result_name @ .

First of all, c! pulls the address from the stack before it pulls the value to be assigned to that address. So that section of the above script should really read "100 result_name c!"

However, even if we fix this there is another more fundamental problem. The word count returns the string length followed by result_name + 1. The word count adds one to the address because that's where the string actually begins. However, even though c@ and c! can work with single bytes in odd numbered addresses (i.e. result_name + 1), the words @ and !, which work with two byte numbers, can only operate on even number addresses (i.e. result_name or result_name + 2). So, when you store a two byte value (the object ID) at result_name + 1, it becomes inaccessible to to @.

Aside from the above issues, the query command appears to be working perfectly! I've been able to use queries 2 and 3 to report data on astronomical objects like you demonstrated above. I still need to construct more complex queries and play around with those. However, I'll open new issues for any bugs I might find later on.

Thanks for your help bringing the query function online!!

Darwin-River / Ex-Machinis

Develop the query command #25