TheHarvard commented 3 years ago

Data Storage in programs

This issue aims to create a discussion around how we should store data in programs.

The standard (and only) way to store data in RAPID programs has been statically declared Data and Arrays. While these are highly efficient, and should be used when approprierte going forward, we are also seeing that some new, more sophisticated options are becoming possible as part of the RNL library.

The problem of Static declaration

The main limitation of the standard data declaration in the RAPID programming language is that the data is statically declared, meaning:

Any data that is going to be used must be manually declared, somewhere In the program.
A data can only be the datatype it is declared to be. The datatype can't be changed at runtime.
An Array can only have the dimensions it was declared with.
The size of each array dimension is statically declared, and can not change (Grow) at runtime.

This often leads to dirty solutions with temp-data or meta counters to signify what parts of the arrays are in use.

New data storage types

As RNL implements concepts like the datapointer, we see that the following alternative storage methods are becoming viable:

The linked list

By linking multible datapointers in a chain you can create a list that can:

Contain multible, arbitrary datatypes
Grow and shrink as elements are added and removed
Be easily inserted into and extracted from without having to move data (only alter pointers)

Due to the inherent design of the list, the code needs to crawl over all elements until it finds the one it wants to perform some operation on. This mean that speed drops proportionally with list length.

A list would also behave exactly like an object, allowing you to create a list of lists for a multidimensional matrix.

A large number of sub types of the linked list could also exist. these sub types could enforce useful restrictions (like enforcing a FIFO queue) or alter the behaviour (Emulating unordered lists). Examples can be:

Type list -> A list containing only one, specified datatype.
Unordered list / set -> A list that does not have any order
Stack -> A list that can only use Append_Front and Pop_Front. LIFO
Queue -> A list that can only use Append_Front and Pop_Back. FIFO
Dictionary -> A list that enforces all elements to be key-value pairs, and uses the key instead of index.

The dictionary

A dictionary is a list of key value pairs that has the same benefits as linked lists.

A dictionary can be made of a linked list that refers to each key value pair, or it can structured as a hash table.

I propose we use linked lists as default for dictionaries, and have a different structure called "Hashtable" for hashtables.

The Hashtable

The Hashtable is a structure for storing data where a specific element can be reached by supplying a key. The key is decoded into a number (Hashed) and this number indicates where in the hashtable the element exists. In theory, a hashtable should be more efficient than a linked list for large sets of data, but comes with a couple more limitations. Whether we can implement it in rapid in a way that emphasizes this boost to efficiency is uncertain; we might have to use arrays with static sizes to avoid having to crawl over all elements in a linked list.

Terminology

When implementing these new structures, we should settle on a unified terminology to use across all programming.

I propose the following:

In the table below i have set up two sets of methods and operation that you can imaging being generally universal for data storage. I have taken inspiration from C++ and Pythons approach and naming convention. I have set up a main colum with what i belive would be the better option, but i have also set up an alternative colum containing a slightly different approach.

Comments to which one is more intuitive and readable is welcome.

Name	Alternative name	Comment

lenght	count	Get the lenght (Size) of a list/dictionary/array
maxLenght		Get the max lenght (Size) of a list/dictionary/array, if applicable
has		Returns true if a specified index/key is valid for a list/dictionary

get		return the element at a spesific index/Key
front	get (index = 1)	return the frontmost element (index = 1)
back	get (index = lenght)	return the last element (index = lenght)

pop	pop (With optional Arg)	pop (Return and delet) the the element at a spesific index/Key
pop_front	pop (index = 1)	pop (Return and delet) the frontmost element (index = 1)
pop_back	pop	pop (Return and delet) the last element (index = lenght)

set		set the element at a spesific index/Key (Create the key if it does not exist in dictionary)
insert		Insert a new element at a spesific index of a list
append	append (With optional Arg)	Add a new element to an unordered list (Not allowed for ordered lists)
append_front	insert (index = 1)	Add a new element in front of the frontmost element (index = 1)
append_back	append	Add a new element after the last element (index = 1)

delete	remove	Remove a element and shorten the list


clear		clear all elements from the list/Dictionary
copy		create a copy of the list/Dictionary (Also allows a part of a list to be copied without copying the whole)
reverse		reverse the order of a ordered list/dictionary

extend		Add one list/dictionary to the end of another
merge

split		Split the list in two at a given index


search	index	serch for and return the index of a spesific element in the list
sort		Sort the order of the elements in a ordered list/dictionary

Actions

We should determine if this is an appropriate way to build data lists, and that this naming convention is approprierte.

Once done, we can start developing these structures and enforcing the convention.

TheHarvard commented 3 years ago

If @RobotSigmund or @AGus-RN Have comments on this, then feel free to add them here.

TheHarvard commented 3 years ago

if we resolve this i think we can close #37

TheHarvard commented 3 years ago

RNL group meeting 2020/01/05 with: @TheHarvard @RobotSigmund

"Clear" and "Set" are allready instructions in RAPID. Both commands are non essential for RAPID, and have more readable altarnatives. We don't consider it an issue overwriting theese.

Actions:

Start development of a linked list strcture using datapointers.

byte-motion / RNL_RAPIDLibrary

linked list strcture using datapointers #57

Data Storage in programs

The problem of Static declaration

New data storage types

The linked list

The dictionary

The Hashtable

Terminology

Actions