Twenkid / Vsy-Jack-Of-All-Trades-AGI-Bulgarian-Internet-Archive-And-Search-Engine

Artificial General Intelligence Infrastructure of "The Sacred Computer" AGI Institute : Custom Intelligent Selective Internet Archiving and Exploration/Crawling; Information Retrieval, Media Monitoring, Search Engine, Smart DB, Data Preservation, Knowledge Extraction,Datasets creation,AI Generative models building and testing,Experiments etc.
MIT License
5 stars 0 forks source link

Review, Selection, Development, Application of Virtual Machine Platform(s)/Language(s)/Compilers for Code Generation and Language-Embedding for Automatic Programming, Code Synthesis, Program Generation, Intermediate Representations Lowering: KidVM, .NET CLR, JVM, LLVM, Zig, C, WebAssembly, Julia, Lua, Python, x64/x86 ASM, C++ ...? #19

Open Twenkid opened 1 year ago

Twenkid commented 1 year ago

The AGI infrastructure is supposed to include Automatic Programming/Automatic Program Learning/Self-Programming, Code Synthesis etc. capabilities.

That is related ideas about directions for a direction that I call "Информатика на развитието", or Developmental Informatics/Developmental Computer Science.

There are unpublished ideas, plans, notes, diagrams and thoughts, some experimental small VM/interpreter/compiler projects, which I wish to materialize and start implementing with this infrastructure.

I will unveil more of the ideas and the goals in the future with their implementation.

Current short evaluation (28.1.2023):

One thing that can be done in this train of thought is to extend KidVM or remake it properly with the intent to serve as a special code generation framework and for that purpose - to get extended with respectively crafted low level instructions for that and turn into a "code generation processor": CGP, and not an ordinary one. That system may generate code on some of the other languages in order to be compiled by the other infrastructures.

Of course, the serious VM and compilers are quite powerful "CGPs", given a source code and languages at their level. This one couldn't win in their game and could just utilize them.

A point where the hypothetic new VM could be better would be to interpret, translate, map, represent, compile, "lower"* other type of code from other structures, yet missing, not just the well established programming languages and AST. These structures and their hierarchical, transformational and graphical parsing would be more important for an AGI reasoning, generalization, search etc. system than the specific lower level languages, loop optimizations, vectorization etc. at the end of the pipeline.

KidVM's "instruction set architecture" could evolve or there could be a spin-off "CGP".
Currently it's a simple low level CPU with some calls of external functions. The Assembly code was the exact identifiers, mnemonics of the instructions, in order to translate them directly. As new instructions were gradually added ad-hoc, the Assembly compiler read and parsed the opcodes from the header file.

To be continued...

References

What is Lowering?

Twenkid commented 1 year ago

The first choice for embedding and sandboxing in C++ seems to be Python:

//MSVC, Project, Options --> ... C++ -- Additional Include Directories: 
//Python path... /Include
//e.g. C:\Users\toshb\AppData\Local\Programs\Python\Python39\
//C:\Users\toshb\AppData\Local\Programs\Python\Python39\include
//C:\Users\toshb\AppData\Local\Programs\Python\Python39\Libs
// ... Libs //Python39.lib, Python 310.lib etc.

#include "Python.h" 
//...
void python() { //#30-1-2023
    std::cout << "py?";
    string modules_path = "z:\\";
    //Initialize the python instance
    Py_Initialize();
    PyObject* sysPath = PySys_GetObject((char*)"path");
    PyList_Append(sysPath, (PyUnicode_FromString(modules_path.c_str())));

    PyRun_SimpleString("from time import time,ctime\n"
        "print('Today is',ctime(time()))\n");

    //Run a python function
    PyObject* pName, * pModule, * pFunc, * pArgs, * pValue;
    pName = PyUnicode_FromString((char*)"testlib");
    pModule = PyImport_Import(pName);
    pFunc = PyObject_GetAttrString(pModule, (char*)"test"); //func
    pArgs = PyTuple_Pack(1, PyUnicode_FromString((char*)"Greg"));
    pValue = PyObject_CallObject(pFunc, pArgs);
    auto result = _PyUnicode_AsString(pValue);
    std::cout << result << std::endl;

    PyRun_SimpleString("def g1(c):\n  a=c/43; b=a*a; print(b)\n"
        "g1(464);\n"
        "print('Today is',ctime(time()))\n");

}
Twenkid commented 1 year ago

Java: JNI & FFM - Foreign Function and Memory API

31.1.2023: I am updating my Java knowledge and skills, tonight I began a quick review and crash course in the updates since Java 8, I was watching some of them in the past, 8 in particular, but I was an active Java developer before that and considered most of the new features as mostly "syntactic sugar" (starting with the lambdas etc.)

Regarding this strategic topic for language embedding and interoperability, in the new refresh I took a quick look at JNI - the Java Native Interface, used to call C/C++ DLLs.

I notice that in the recent Java versions a new more robust model is introduced, called "Foreign Function and Memory API": https://openjdk.org/jeps/424 The FFM API is released in the most recent Java 19, being "incubated" since several earlier revisions.

It may come in handy.

As of Java in general, it's fun. For now I use Eclipse.

image

2.2.2023: Since Java 16: JEP 389 - Foreign Linker API (Incubator) − Java code can be called by C/C++ or vice versa using new API replacing the JNI. It's been then part of /merged to the FFM API.

An example of Java-C interop and glue code from the docs: https://openjdk.org/jeps/424

// 1. Find foreign function on the C library path
Linker linker = Linker.nativeLinker();
SymbolLookup stdlib = linker.defaultLookup();
MethodHandle radixSort = linker.downcallHandle(stdlib.lookup("radixsort"), ...);
// 2. Allocate on-heap memory to store four strings
String[] javaStrings   = { "mouse", "cat", "dog", "car" };
// 3. Allocate off-heap memory to store four pointers
SegmentAllocator allocator = SegmentAllocator.implicitAllocator();
MemorySegment offHeap  = allocator.allocateArray(ValueLayout.ADDRESS, javaStrings.length);
// 4. Copy the strings from on-heap to off-heap
for (int i = 0; i < javaStrings.length; i++) {
    // Allocate a string off-heap, then store a pointer to it
    MemorySegment cString = allocator.allocateUtf8String(javaStrings[i]);
    offHeap.setAtIndex(ValueLayout.ADDRESS, i, cString);
}
// 5. Sort the off-heap data by calling the foreign function
radixSort.invoke(offHeap, javaStrings.length, MemoryAddress.NULL, '\0');
// 6. Copy the (reordered) strings from off-heap to on-heap
for (int i = 0; i < javaStrings.length; i++) {
    MemoryAddress cStringPtr = offHeap.getAtIndex(ValueLayout.ADDRESS, i);
    javaStrings[i] = cStringPtr.getUtf8String(0);
}
assert Arrays.equals(javaStrings, new String[] {"car", "cat", "dog", "mouse"});  // true
Twenkid commented 1 year ago

One of the major recent implemented additions to ACS in 2022 was the introduction of a WebView2 control and ongoing extensions and experiments with custom interfaces to it. It is planned to grow to a complete browser for faster and smarter browsing, search, display, connected to the cognitive acceleration system. (...)

WebView2 is a modern web browser control, based on Edge. ACS had a browser years ago, but previously the "Assistant" used the original .NET web control, which had a very old rendering engine - IE6 or IE7? (...)

https://learn.microsoft.com/en-us/microsoft-edge/webview2/