VUnit / vunit

VUnit is a unit testing framework for VHDL/SystemVerilog
http://vunit.github.io/
Other
734 stars 263 forks source link

New feature proposal: a codified dynamic type system #521

Open bradleyharden opened 5 years ago

bradleyharden commented 5 years ago

I mentioned this on Gitter, but I think it might be best to create an issue to consolidate the discussion and make sure comments don't get lost.

I would like to develop a generic dynamic typing system. I'm drawing inspiration from the queue_element_type_t and trying to generalize and expose that as a codified interface. My goal is to provide a generic type mark that contains everything you need to know to encode and decode a value of that type. What I'm aiming for is something like this

type type_class_t is (scalar_type, array_type, variable_type);

type type_t is record
  p_name  : string;
  p_class : type_class_t;
  p_size  : real;
end record;

For scalar types, size represents the number of bytes to encode a value. For array types, size represents the number of bytes to encode a single element of the array. Variable types have undetermined size. We can provide a function to determine the encoded length of any type

function code_length(
  typ    : type_t;
  length : natural := 0)
  return integer
is
begin
  case type_class(typ) is
    when scalar_type =>
      return size(typ);
    when array_type =>
      return integer(ceil(size(typ) * real(length)));
    when variable_type =>
      return -1;
  end case;
end;

However, I will probably end up using a numerator and a denominator instead of a real, since the size is always a rational number. That way we can use integer division to calculate the size.

My initial thought was that you could declare all types as constants of this record type. However, to use it as a type mark would require that you always encode the entire record with the encoded data. Instead, it would be easier to implement it as

type type_t is record
  p_id : integer;
end record;

Then, we can use the type ID as an index into an array of string_ptr for getting the type name, an array of type_class_t for getting the type class, and an array of real for getting the size. This is the same strategy as msg_type_t.

Next, we can define an element

type element_t is record
  p_type     : type_t;
  p_range    : range_t;
  p_code     : string;
end record;

I'm not sure whether to keep the range as a range_t, i.e. bit_vector, or to store it as an encoded range. It probably makes more sense to store it as string(1 to 9). The simulator's internal representation of range_t will need to be at least 9 bytes anyway, and then the size doesn't vary based on the range.

You could convert between element_t and all other types with the functions to_element and from_element, which would essentially be light wrappers around encode and decode. The aliases for from_element could also be to_std_logic_vector, etc. to mimic a real type conversion.

With this record, we can reformulate queue_pkg to push and pop elements. The string representation inside the queue wouldn't change, but every push and pop would be forced through push_element and pop_element.

In fact, that brings up a separate point. Why is queue_t implemented with a single, variable length string? Why not implement it as a variable length integer_vector and treat each integer as a string_ptr to an element? Then you never have to use push_variable_string and you also know the how many elements are in the queue at any moment. To me, that seems like a much better way to implement it. Is there some reason the current implementation was chosen instead?

Anyway, element_t is great for pushing things around internally, using unconstrained function inputs and return values, but it isn't easy for users at the top level. There, I think it makes more sense to give them

type element_ptr_t is record
  p_type  : type_t;
  p_range : string_ptr;
  p_code  : string_ptr;
end record;

I'm not sure whether to use separate string pointers for the range and code here, or to combine them into a single string_ptr. Either way, this record is of fixed size and can be easily manipulated at the top level.

With element_t and element_ptr_t you could perform type conversions of the underlying coded data. To facilitate that, I would like to change the encoding of all base types to use the standard binary representation. For example, instead of using 'T' and 'F' for a boolean, use character'val(0) and character'val(1) That way, you could easily convert between boolean and a unit8 type. Or you could convert a real type into an array of 4 uint16 types. To accomplish this with real, we could use the to_real function from float_pkg, rather than the custom implementation right now.

Next, you can define array_t as an array of element_t, where array_t stores each element internally as a pointer to an encoded string. In effect, element_t acts as the container that @kraigher mentioned one time on Gitter when discussing arrays of arbitrary type.

Finally, you could define

type dynamic_t is record
  p_type  : type_t;
  p_array : array_t;
end record;

Using this, users could dynamically create a new type_t and append elements to the dynamic_t to define their own containers for custom records.

Also, dict_t could be modified to accept element_t and dynamic_t as keys and values. You could also create a list_t based on array_t with a focus on list operations rather than a general array type, like integer_array_t. I think there are many different directions you could take this.

Separately, I also have an idea for a bit_array_t. Its purpose would be to store elements of arbitrary bit width. Internally, the data would be stored as one large string. Each bit_array would store a range_t that indicates the size of each element. The get and set functions would pull out a sub string and decode it to set/get the desired bits. You could even change the range_t dynamically to change your view of the bit_array. This is similar to read_word and write_word in memory_t, but not restricted to 8-bit boundaries. @umarcor might be interested in this as the basis for co-simulation. I think it would be the most flexible option.

LarsAsplund commented 5 years ago

Looks interesting! The best way forward would be to take small value adding steps and see where this goes.

For the first step I would suggest that you create type_t as a record containing an id and then a new_type_t function. Once you have that you can replace the queue element type enums with

constant name_of_type : type_t := new_type_t("name_of_type");

To start with there is really no need for anything but a name property. Should consider making the ID a string to avoid some run-time encoding. At this point there is no difference between type_t and msg_type_t and maybe it should be the same thing in the end. A message type is just another type.

The second step would be to define an element_t, As you noted the unconstrained string would be problematic with VHDL-93 but does it really help to have a record with several encoded fields? Why not make it one long string.

In fact, that brings up a separate point. Why is queue_t implemented with a single, variable length string? Why not implement it as a variable length integer_vector and treat each integer as a string_ptr to an element? Then you never have to use push_variable_string and you also know the how many elements are in the queue at any moment. To me, that seems like a much better way to implement it. Is there some reason the current implementation was chosen instead?

I'm not sure but @kraigher probably knows. It may have to do with memory performance/utilization. One large string vs many small fragmented strings.

Anyway, element_t is great for pushing things around internally, using unconstrained function inputs and return values, but it isn't easy for users at the top level.

If element_t is just used internally it's not much value in doing the refactoring. We are just moving things around but also create a whole new set of to/from_element subprograms for every type.

There, I think it makes more sense to give them

type element_ptr_t is record p_type : type_t; p_range : string_ptr; p_code : string_ptr; end record;

I'm not sure whether to use separate string pointers for the range and code here, or to combine them into a single string_ptr. Either way, this record is of fixed size and can be easily manipulated at the top level.

With the range being part of the code today it's probably easier to keep them together. If so, why not keep all together?

With element_t and element_ptr_t you could perform type conversions of the underlying coded data. To facilitate that, I would like to change the encoding of all base types to use the standard binary representation. For example, instead of using 'T' and 'F' for a boolean, use character'val(0) and character'val(1) That way, you could easily convert between boolean and a unit8 type. Or you could convert a real type into an array of 4 uint16 types. To accomplish this with real, we could use the to_real function from float_pkg, rather than the custom implementation right now.

The true/false values comes from a time when we also supported humanly readable codecs for debugging. That could be changed. The real implementation is a workaround for some simulator if I remember correctly. Do we really want to encourage type casting?

Next, you can define array_t as an array of element_t, where array_t stores each element internally as a pointer to an encoded string. In effect, element_t acts as the container that @kraigher mentioned one time on Gitter when discussing arrays of arbitrary type.

If the element is in itself an encoded string this would be easy.

Finally, you could define

type dynamic_t is record p_type : type_t; p_array : array_t; end record; Using this, users could dynamically create a new type_t and append elements to the dynamic_t to define their own containers for custom records.

Also, dict_t could be modified to accept element_t and dynamic_t as keys and values. You could also create a list_t based on array_t with a focus on list operations rather than a general array type, like integer_array_t. I think there are many different directions you could take this.

Yes, there should be many possibilities. All standard data structures should have a generic variant.

Separately, I also have an idea for a bit_array_t. Its purpose would be to store elements of arbitrary bit width. Internally, the data would be stored as one large string. Each bit_array would store a range_t that indicates the size of each element. The get and set functions would pull out a sub string and decode it to set/get the desired bits. You could even change the range_t dynamically to change your view of the bit_array. This is similar to read_word and write_word in memory_t, but not restricted to 8-bit boundaries. @umarcor might be interested in this as the basis for co-simulation. I think it would be the most flexible option.

I'll have to dig more into what @umarcor have done to understand the implications. I let him respond to this

bradleyharden commented 5 years ago

The second step would be to define an element_t, As you noted the unconstrained string would be problematic with VHDL-93 but does it really help to have a record with several encoded fields? Why not make it one long string.

My original thought was that there would be a problem defining ownership. If element_t uses a string_ptr for the code, then who owns it when elements are created internally to VUnit, especially when pushing and popping? However, I think this might be solvable by carefully structuring functions and procedures to guarantee there is only one owner.

With the range being part of the code today it's probably easier to keep them together. If so, why not keep all together?

Part of my motivation here is to facilitate bit_array_t. By separating the range from the code, you can generate different views of the data. And more generally, the range and the code are separate entities to me, even if they are closely related. It might make sense to store them together, but we should still have functions get_range and get_code to pull them out separately, if required.

I've actually already completed and tested a refactoring of codec_pkg and friends in this regard. I created new encode procedures in codec_builder_pkg, and I migrated all of the encoding algorithms there. I also changed all the decode procedures in codec_builder to only decode the actual data, not the range_t. Decoding the range_t is now handled by the encode and decode functions in codec_pkg.

One of my main motivations here was to bring symmetry to the structure of the two packages and to keep the algorithms for encoding and decoding a data type adjacent to each other in the same package. It was a real pain when you had to flip back and forth between packages to see how something was encoded and decoded. I also took the opportunity to clean up some of the algorithms. For example, I greatly simplified the algorithms for std_ulogic_array.

Do we really want to encourage type casting?

I don't want to encourage it, but I think all tools are valuable if used appropriately. I'm sure there are some valid use cases. However, it would be reasonable to issue strong warnings about it in the documentation.

I'll put together a useful addtion and make a pull request.

bradleyharden commented 5 years ago

Also, you didn't answer one of my Gitter questions. Some of my changes to codec_pkg and codec_builder_pkg broke the code generation tools for the old com library. Is that ok? Can we remove those now?

umarcor commented 5 years ago

I'm not sure but @kraigher probably knows. It may have to do with memory performance/utilization. One large string vs many small fragmented strings.

This is coherent with some comments in the sources of string_ptr_pk and integer_vector_pkg, where the size of the allocated buffers is increased 'speculatively' to avoid too many allocations.


This is similar to read_word and write_word in memory_t, but not restricted to 8-bit boundaries. @umarcor might be interested in this as the basis for co-simulation. I think it would be the most flexible option.

I'll have to dig more into what @umarcor have done to understand the implications. I let him respond to this

I must admit I'm a bit overwhelmed with this issue. I have not dig into queue and codec sources yet, so I'm blind. Anyway, I'll do my best:

Currently, string_ptr and integer_vector_ptr are implemented as a pointer to an array of pointers to the type (character or integer). A variable is used to track the number of allocated elements. My proposal (#507) is to add a level of indexing: use a record that contains three pointers to arrays (of pointers), and three variables to track the number of allocated elements:

  type ptr_storage is record
    id    : integer;
    ptr   : integer;
    eptr  : integer;
    ids   : storage_vector_access_t;
    ptrs  : vava_t;
    eptrs : evava_t;
  end record;

This is all built on top of a new type, type storage_mode_t is (internal, extfnc, extacc);, which is also a new option/parameter added to new_string_ptr and new_integer_vector_ptr.

When a new vector is requested, a new element is added to ids and id is incremented. If it is of type internal or extacc, the other vectors/variables are updated accordingly.

Then, for each get, set, length, resize, etc. a 'case when' is used to do up to three different actions (depending on the type).

NOTE: ptrs and eptrs are fundamentally the same. The difference is who 'calls malloc', VHDL/GHDL or C/Python. Unfortunately, explicit separate types are required. See ghdl/ghdl#797.

NOTE: allocation of type extfnc are only identified by an id. No array type is managed at all. Instead, resources are used through read_char, write_char, read_integer, write_integer (which are functions expected to be defined in C/Ada, but that can be implemented in VHDL). This is meant to allow the user to implement any custom function, which might even connect to some remote board/server/device.

This is implemented in #507 for string_ptr (and byte_vector_ptr as an alias) and in #476 for integer_vector_ptr. However, @bradleyharden, I think that none of these issue conflict with the changes you propose. These are lower in the abstraction stack. Since the mode defaults to internal, you should find absolutely no functional difference. Nevertheless, I think it is important for you to know this context.

Then comes #470, which tries to extend the mechanism to memory_t in verification_components. As a matter of fact, this is the feature that motivated the series of PRs (see #462). This is where the fun begins!

NOTE: the code proposal in #470 is not valid yet.

Please see this comment by @kraigher, where he explains some very basic details of the memory model which I didn't know at the moment. Then, from my last comment there:

  • Data and metadata in VHDL.
  • Data in an external byte array and metadata in VHDL.
  • Data in VHDL and metadata in an external integer array.
  • Data and metada in an external integer array.

Hence, I believe that these same possibilities might apply for any other type that is built on top of either string_ptr or integer_vector_ptr. Each time you allocate one of them, you can decide to do it internally, as an external access or to be used through functions.

However, I don't know when is queue_t used, and I think I'm mixing it with this comment. Is it possible that when @kraigher said that verification components interact with the memory model, he was referring to memory-mapped verification components? I.e. stream verification components use queue only and do not interact with memory at all. If so, I would say that queue 'must' support external models to be coherent with memory; so that verification components of any kind can be externally loaded.

Precisely, I use AXI Stream VCs quite a lot. When data is allocated externally, I have a simple for loop in the testbench that pushes it to an AXI Stream Master, and another loop that pulls results from an AXI Stream Slave. This is not a problem due to the streaming nature of the flow. I.e, I can write/read a value per clock cycle, because I know that they are sequential. With a (RAM) memory, I need to copy everything before the actual testbench starts.

Nonetheless, after this discussion, I believe that copying data from an external buffer to an AXI Stream verification component is not required. It should be possible to tell the verification component that data is already allocated and how many data elements are available.

Regarding the 'codified dynamic type system', I'm getting lost with the 'codified' term. For me, such a feature should allow the user to define a record type in VHDL and an equivalent struct in C. Then, allocate an array of size number_of_elements*bytes_per_struct bytes, and have C access it through casting and GHDL use it as the 'memory' of a queue. I feel that you have tried to answer and explain this explicitly in the first comment, but I am not being able to understand how do you 'cast' an string(1 to 9) to a custom record type in VHDL.

Last, but not least, the external models for string_ptr and integer_vector_ptr are almost ready, but there are some flaws to solve yet. The most important one is that the length is provided when the vector is created and it cannot be modified for external models; Resize, reallocate and deallocate are not implemented for external models. With models of type extacc, it is technically easy for either VHDL or C to do the reallocation, resizing, etc. However, with extfnc, it can be impossible.


As a side but related note, I think that currently vectors are given an index when they are allocated and the index is never reused, even if the vector is deallocated. As a result, the user is 'responsible' of allocating a sensible number of pointers and reusing the identifiers for an optimal solution. I believe it could be an interesting enhancement to keep track of the deallocated indexes without adding other three vectors to ptr_storage.

EDIT

Or you could convert a real type into an array of 4 uint16 types.

My current proposal is to only implement string_ptr (1 byte) and integer_vector_ptr (4 bytes), because that's already quite a lot of code duplication (4 files are almost identical). However, if you find it useful, it can be implemented for a type of 8 bytes too: https://ghdl.readthedocs.io/en/latest/using/Foreign.html#restrictions-on-foreign-declarations

LarsAsplund commented 5 years ago

@bradleyharden

Part of my motivation here is to facilitate bit_array_t. By separating the range from the code, you can generate different views of the data. And more generally, the range and the code are separate entities to me, even if they are closely related. It might make sense to store them together, but we should still have functions get_range and get_code to pull them out separately, if required.

Let's see how it turns out. In this case there are no VHDL-93 issues

I'll put together a useful addtion and make a pull request.

Great

Also, you didn't answer one of my Gitter questions. Some of my changes to codec_pkg and codec_builder_pkg broke the code generation tools for the old com library. Is that ok? Can we remove those now?

Yes, we've supported it long enough. I think we should make a commit where all deprecated stuff in com, logging, check, and run is removed.

bradleyharden commented 5 years ago

@LarsAsplund, I remembered the issue that motivated my distinction between element_t and element_ptr_t.

I think it would be elegant and unifying to use some version of element_t as the fundamental unit that queue_pkg operates on. However, queue_pkg always pushes the basic types by value. But if element_t is defined in terms of a string_ptr, pushing by value becomes awkward and will probably have poor performance. Separating element_t and element_ptr_t solves this issue and provides an elegant implementation of queue_pkg

Suppose a user wants to push an integer and that queue_pkg only operates on instances of element_t. In that case, push_integer would have to create a new, dynamically allocated element_t. The element would be passed to an internal procedure of queue_pkg, where the string would be copied into the queue. Then push_integer would deallocate the element. pop_integer would have to do the same. Thus, for a single push and pop, you have two allocations and deallocations.

There are a few VHDL-93-compatible alternatives here, but I don't think either is good.

With VHDL-2008, this is a non-issue. element_t would be based on string rather than string_ptr. All of the push functions could create new elements without a new allocation, and the problem is solved.

** @umarcor I'm not totally sure that queue_t was implemented as a single string to avoid too many allocations. The comment in queue_pkg refers to copying. If every new allocation requires that you copy all the existing data, then you don't want to allocate too often. However, the problem would be much different if queue_t were implemented as an array of integers representing string_ptr. Each element would require an allocation, so the total number of allocations would probably go up, but the allocation of each element doesn't require copying. More allocations is a disadvantage, but there would also be a benefit. The queue itself would have to be resized less often, because there is only one integer per element, rather than many characters per element.

If we were to change the implementation of queue_t, then the pass-by-reference alternative above would actually fit well. You could define element_t as a single string_ptr with functions to extract the type_t, range_t etc. When pushing an element to a queue, you would just set the next queue string_ptr equal to the element string_ptr. As far as I can see, this is the only elegant way to do it in VHDL-93. @LarsAsplund, do you have any thoughts on this approach? I realize it's pretty drastic. But at the same time, it wouldn't change any interfaces.

bradleyharden commented 5 years ago

I just took a deeper look at queue_pkg and I think there is room for improvement in the implementation. Right now, the string buffer is linear. When trying to push would overflow the buffer, a new, larger buffer is allocated and old data is dropped.

Changing the implementation to use a circular buffer would cut down on the number of times you need to resize. There would be a little more bookkeeping internally, but I think that would be minor relative to the benefits of a circular buffer. It would probably also make sense to stick to power-of-two buffer sizes.

I might try to re-implement queue_t myself, with the alternative mentioned above, and see what kind of performance I get.

bradleyharden commented 5 years ago

I had another realization about the current implementation of queue_t. When a queue is initialized, the string_ptr length is 0. Here is the line where the queue is resized.

resize(queue.data, 2 * length(queue) + 1, drop => head);

Suppose a 24-bit std_ulogic_vector is the first element pushed to the queue. The coded form has four fields, a 1-byte type, a 4-byte integer for the string length, a 9-byte range, and 12 bytes of data. That's 26 total bytes.

To push that string to the queue, the current implementation has to resize itself five times, from 0 to 1, 3, 7, 15, and finally 31 characters. It gets better once the string is of a reasonable length, but it has to reallocate and copy itself several times to bootstrap to a reasonable length. Just initializing the string to some reasonable length could increase performance.

I'd like to get @kraigher's thoughts on this when he returns.

bradleyharden commented 5 years ago

Sorry for the spam, but I want to jot these ideas down while I have them.

We can extend the reformulation of queue_t further. Define element_t as a string_ptr, then define array_t as a dynamic array of element_t. You can then base both queue_t and dynamic_t on array_t. The former is used when you want to consume and deallocate the elements as you go. The latter is used when you want to read the values multiple times. Maybe dynamic_t should be named container_t? Either way, you could have a function to cast a queue_t as a container_t and vice versa. Then you could change msg_t to accept and provide either format. That way you could get a message, convert its data to container_t and iterate over the data multiple times without having to store it yourself. It would also allow you to see how many elements are in a message before you use the data.

This change would also require that we modify the payload element of message_t, but I don't think it would be a big deal.

bradleyharden commented 5 years ago

@LarsAsplund, I now have a working implementation of queue_t that uses an array of string_ptr_t rather than a single string. It also uses a circular buffer, rather than a linear buffer. Furthermore, I updated the implementation of string_ptr_pkg to keep track of deallocated reference indices and reuse them. I wanted to minimize the number of times that ptrs, the string_access_vector, is resized, since there will be a sharp increase in the number of string_ptr_t allocations with this new strategy. I plan to do some performance tests comparing my implementation with the current one. I'll post the results.

umarcor commented 5 years ago

@bradleyharden, did you do those changes on top of #507 or independent to them?

bradleyharden commented 5 years ago

No, I did not consider it. It wouldn't be hard to merge them though. All I did is add an integer_vector to use as a stack for old pointer reference indices.

The changes are just meant to get me to the point where I can test the performance difference between implementations of queue_t. We'll see what happens after that.

bradleyharden commented 5 years ago

@LarsAsplund and @kraigher, I have some preliminary performance tests of my new implementation of queue_t.

My version of queue_t uses an integer_vector_ptr for the data, where each integer is a string_ptr to a pushed element. I also used a circular buffer, rather than a linear buffer like the current implementation. I think my implementation has a number of advantages:

So far, it looks like my implementation has better performance overall.

I've run two types of tests below. Here is an example of the "Push then pop" test

for i in 0 to 2**16 - 1 loop    
  push(queue, 0);    
end loop;    
for i in 0 to 2**16 - 1 loop    
  int := pop(queue);    
end loop;

And here is an example of the "Push/pop" test

for i in 0 to 2**16 - 1 loop
  push(queue, 0);
  int := pop(queue);
end loop;

Here are the results for the master branch

==== Summary ======================================================================
pass vunit_lib.tb_queue_performance.Push then pop 2^16 integers (0.5 seconds)
pass vunit_lib.tb_queue_performance.Push then pop 2^18 integers (1.3 seconds)
pass vunit_lib.tb_queue_performance.Push then pop 2^20 integers (4.7 seconds)
pass vunit_lib.tb_queue_performance.Push then pop 2^21 integers (9.2 seconds)
pass vunit_lib.tb_queue_performance.Push then pop 2^22 integers (18.2 seconds)
pass vunit_lib.tb_queue_performance.Push/pop 2^16 integers      (0.5 seconds)
pass vunit_lib.tb_queue_performance.Push/pop 2^18 integers      (1.4 seconds)
pass vunit_lib.tb_queue_performance.Push/pop 2^20 integers      (5.2 seconds)
pass vunit_lib.tb_queue_performance.Push/pop 2^21 integers      (10.0 seconds)
pass vunit_lib.tb_queue_performance.Push/pop 2^22 integers      (20.1 seconds)
===================================================================================
pass 10 of 10
===================================================================================
Total time was 71.2 seconds
Elapsed time was 71.2 seconds
===================================================================================
All passed!

And here are the results for my implementation

==== Summary ======================================================================
pass vunit_lib.tb_queue_performance.Push then pop 2^16 integers (0.4 seconds)
pass vunit_lib.tb_queue_performance.Push then pop 2^18 integers (1.0 seconds)
pass vunit_lib.tb_queue_performance.Push then pop 2^20 integers (3.6 seconds)
pass vunit_lib.tb_queue_performance.Push then pop 2^21 integers (7.2 seconds)
pass vunit_lib.tb_queue_performance.Push then pop 2^22 integers (14.8 seconds)
pass vunit_lib.tb_queue_performance.Push/pop 2^16 integers      (0.4 seconds)
pass vunit_lib.tb_queue_performance.Push/pop 2^18 integers      (1.0 seconds)
pass vunit_lib.tb_queue_performance.Push/pop 2^20 integers      (3.5 seconds)
pass vunit_lib.tb_queue_performance.Push/pop 2^21 integers      (6.9 seconds)
pass vunit_lib.tb_queue_performance.Push/pop 2^22 integers      (13.5 seconds)
===================================================================================
pass 10 of 10
===================================================================================
Total time was 52.4 seconds
Elapsed time was 52.4 seconds
===================================================================================
All passed!

So far, my implementation is better in every test. The advantage of a circular buffer is especially evident in the "Push/pop" tests. My implementation never has to resize the buffer, but the current implementation has to resize it regularly.

Are there any other performance tests you would like me to try?

With your permission, I would like to clean up my implementation and push it. The new queue_t implementation will also facilitate my implementation of element_t in a manner that is compatible with VHDL-93 (see my rambling posts above). When it all comes together, I think the new typing system will be really elegant.

umarcor commented 5 years ago

@bradleyharden, that sounds really good. I'm looking forward to read the PR!

abaebae commented 6 months ago

Great ideas, but it seems like nothing has happened for a few years?

LarsAsplund commented 6 months ago

No, this has been sleeping for quite some time but I think it will be revisited as we continue with our Python integration in #986.