Api Design targets 2.0 discussion

EFeru / DbcParser

.NET CAN bus dbc file parser

MIT License

78 stars 28 forks source link

Api Design targets 2.0 discussion #74

Open Uight opened 3 months ago

Uight commented 3 months ago

This issue is to be used to discuss how the API of the DBC Parser should be changed to allow max usability across multiple use cases.

For me the use case is parse the dbc once and then use this data to send and receive can and can fd farmes. The only real requirement for this is that the receive stuff should be fast (fast enough to handle more than 1000 messages per second optimally sequential but i would also parallize it if thats not reachable) Send must not be that fast as normaly we would only send a few messages with cycle times of around 10ms or less. Maybe in the ballpark of 200 messages per second.

Additional requirements:

I need the pack functions and unpack functions to work with byte arrays.
I need extended mutliplexing to be supported.
(Bonus: i dont really like properties that could be null in an API => see current custom properties)

For this i see some possibilities.

provide the basic functions and let the user make it fast
Change the api so it is "fast" right away
Keep the api for the normal dbc stuff and move all packing and unpacking to a "seperate" lets say namespace where everything is optimized for speed.

From my test with the benchmarker for the packing and unpacking i know that you can not afford to calc any properties everytime. There are some properties relevant in this for receiving especially. Does the signal have a scaling (factor or offset) if not dont calc it. Does the message i receive contain multiplexed signals? And is a specific signal multiplexed.

To achieve this i would consider several options:

convert the dbc to a packer with a function like DBC.CreatePacker(). The packer then holds all data internally in an optimized way and allows for functions like. Packer.UnpackMessage(uint ID, byte[] data, out dictionary<string, double> values);

Big contra in this is that you pretty much have all classes duplicated (just like currently with teh immutable stuff). This introduces more maintainace and probably some inconsistensies with every change. Pro: No api change. Probably the fasted way possible but i wouldnt say you need every grain of performance as then your probably using the wrong system anyway

Keep the packer seperate and use the dbc in the packer. This would require the API of the dbc to change as dictionaries are needed for speed. Contra: Api changes, only reasonably fast Pro: no duplicated classes

Ill write more on this if more stuff gets to my mind ;)

Adhara3 commented 3 months ago

So we are saying that this lib is providing 2 functionalities

parsing a dbc file which should be accurate (sure also fast if possible, but accurate is the most important thing here)
reading/writing data based on dbc spec and this should be fast

To me these are two completely different things and should be treated as such also because it would allow parallel coding on different areas This is why I would prefer solution 1 above

convert the dbc to a packer with a function like DBC.CreatePacker(). The packer then holds all data internally in an optimized way and allows for functions like. Packer.UnpackMessage(uint ID, byte[] data, out dictionary<string, double> values);

Sure it does imply some duplication but this duplication would probably fulfill the SRP. Moreover:

for pack/unpack only a tiny subset of info is needed (e.g. no metadata, names, nested structures)
dbc may have the need to be browsable or if you will to search stuff or to show stuff in a UI which is way different than providing a fast pack/unpack capability. Please note that when I say this I mean that you may end up with a class that has duplicated properties to accomodate different usages, which would make code unreadable
pack and unpack functionality if extracted may get rid of properties and favor classes/strategies (e.g. Does the signal have a scaling (factor or offset) if not dont calc it the latter sounding like a NoScalingStrategy)
The extract pattern could allow in future a .Edit() method that provides a mutable copy suitable for editing
Some code duplication should be taken into account anyway because dbc it's a sequential format so during parsing having mutable objects to update is a must (also think about duplicated lines, custom properties definitions with multiple value definitions, etc...). In this scenario working with objects having public getters and setters is useful and practical. Sure those classes should be marked internal because their mutability requirement is an internal requirement. Even more on this: when parsing it may be convenient to read and extract strings only, so treat "100" as a string not an int. The int.Parse conversion could happen in a second phase, this would imply that internal parsing objects and public API object have different data types which again would smooth the duplication part

About this

Additional requirements:

I need the pack functions and unpack functions to work with byte arrays.

I need extended mutliplexing to be supported.

(Bonus: i dont really like properties that could be null in an API => see current custom properties)

1 and 2 are feature requests and/or bug fixes so to me they really must be done, but they do not belong into the design category even if some API signatures may change (e.g. from ulong to byte[]). The third point: the reason for that to discriminate between properties that do not had a value in file (null) and properties that did have a value (which could be string.Empty, a usualy valid value). I say this also because there ware a feature request from a user to discriminate between a value being 0 or null (I think it was about CycleTime). I also hate null checking but null here has a business meaning that would be lost if using string.Empty.

More specifically about the CustomProperties the implementation came from here and tries to handle the fact that custom properties are of <T> which makes interfaces a bit useless. I am open to different solutions as long as we keep the same functionality, feel free to suggest.

My 2 cents A

Uight commented 3 months ago

Assuming we would go with seperate objects in the parser/packer part then firstly if would move the packer to a different namespace or even a seprate project in the solution? Would that mean two nugets? Or shoudl we keep everything in the one solution?

Another question would be how to handle data that should be accessible in the packer for example lets say multiplexing and as a second example if a signal has limits. Both informations could also be possible relevant for the normal use. The question boils down to should the parser only have a subset of informations of the parser with maybe a different composition for speed or is it valid to have the information if a signal has limits only in the parser and calc them when these objects are created. I would lean to having the packer just using a transformed subset of the original with no additional data meaning that all properties the packer needs are available in the parser.

For the parser itself there is the question how and when to create a usefull datastructure. For example i would consider a usefull datastructur for multiplexing/extended multiplexing to be:

Message knows if it contains multiplexed signals or not; Signals knows if it is multiplexed or not via a enum (None, Multiplexor, Multiplexed, MultiplexedMultiplexor) Every multiplexor(also MultiplexedMultiplexor) should know which signals are multiplexed by it and with what value or range(s). Every mutliplexed signal should know its own Multiplexor and the value or range(s) at which it is expected.

The structure above is impossible to generate while parsing and can only be generated when parsing is done in a final step. This leads to the question if the parser should also have two sets of classes just as it is now. One temp calss for parsing which is the current normal classes and one final class (which is the current immutable classes)? The packer would than have classes based on the immutable classes but further optimized. From a SRP prespective the temp classes are for parsing the packer classes for packing but the normal immutable classes are for? (the user?).

As it currently stand many of the useful properties like multiplexing is done via extension methods. which kinda solves the problem of having no final point in time where these stuff can be calculated. However this is inconsistent across different stuff. E.G. IsInteger is calulated during parsing. What would be the solution here? I woudl rather have calucalted properties for that than using extension methods. Yes that means that the props are calculated new with every call but it wouldnt leed to inconsistenscies with when extension method was actually called.

Adhara3 commented 3 months ago

Let me point out that if we are talking abount a 2.0 version then a API breaking change can happen. I would completely remove the Packet/Unpacker static classes/helpers and do something like

dbc.CreatePacker();

which returns an object (this is pseudocode!) from which you can pick the requested signal and call unpack on it

packer.ReadSignal(signalName, data); // data is the byte[]

This would hide the

complexity of indexing, implementation detail
which class is used internally: when internally perform a lookup for signalName you do not need to get an object of type Signal (the current one) but it could be a SignalReader which has been build from a regular Signal during the CreatePacker() call. This would keep the public API very thin which has the advantage to be less prone to changes.

The main point is: Signal/message properties should be there for browsing the dbc only. If we don't allow the packer to request those info, we may simplify things. But if you refer to keep the packer external and passa a regular Signal to that, then Signal should be optimized. This latter solution can work too, it just sounds a bit more risky to me. But if you prefer, I'm fine with that

As an idea:

For more details we need to write some code Cheers A

Uight commented 3 months ago

before i would start on the packer stuff i would still want to define the "End of parsing:freeze" stuff.

Should we use the existing Immutable Classes for that and how should they be named? Would this mean the current classes would be switched to internal?
Should the immutable dbc have a reference to the original mutable object? This could allow for regenerating the data after changes are applied. If yes should only the main dbc file contain its original or should every signal/message/node hold a ref to its origin?
How should the immutable stuff be triggered. currently theres the commented out code in the Builder that does it seperatly for all. I would more like to have on entry point for this on the dbc class level.
Which properties should the immutable dbc have in addition to the originals? Readd the extensionmethods as properties to the immutable stuff or change the extension to extend the immutable stuff?

as for the splitting of packing and unpacking i would only create one class that does both. Depends were you see your SRP. (For me its message handling which includes packing and unpacking.

Adhara3 commented 3 months ago

I'll create a branch tomorrow but basically public class Signal ==> internal class MutableParsingSignal and public class ImmutableSignal ==> public class Signal, same for Message
I wouldn't do that now
I prefer the builder trigger. We call .Build(), It knows the parsing is done, finalizes what need to be finalized, optimizes what needs to be optimized, I do not want DBC to mess around with mutable/immutable stuff, I want to avoid race conditions, the fact that we need mutable objects it's a builder implementation detail. But just to be clear, this is my opinion, if the majority here dislikes let's do another way. To me it's important that this concepts is not part of the public API
See the branch tomorrow, but not many, the majority is a type replacement e.g. string Multiplexing becomes MultiplexingInfo Multiplexing

Goodnight A

Adhara3 commented 2 months ago

In the poc/immutability branch there is a strating point on how I would do it. It is not complete, it is just to get an idea.

In particular I would focus on:

public immutable classes can have more complex data types, e.g. public MessageId ID { get; }
I removed some extension methods e.g. now public bool IsMultiplexed => Signals != null && Signals.Any(s => s.Multiplexing.Role == MultiplexingRole.Multiplexor); which is more optimized than before because under the hood it is not parsing the multiplexing string anymore. Could be improved even more, that was out of scope now
For signal public MultiplexingInfo Multiplexing { get; } no longer a string. I just reused what we have, but the type could be improved
DbcBuild.Build() is the moment where the parsing is done, all has been loaded and we can calculate stuff so that the Dbc is created. The Dbc itself is not aware of this, nor are the immutable classes.

If this was my codebase, I would go down this way. If you do not agree, I'll just delete the branch and you can go the way you like.

Cheers A

Uight commented 2 months ago

@Adhara3 for the current version i have some remarks; Some where allready present before just want to list them to. And obviosly you only touched message and signal so im focusing on that.

First thing i wanted to open was the Signal class but theres not Signal.cs. I thing Signal is now the main Signal class the parsing stuff is only internal. Two solutions (Split files into two files or use Signal as the main Name)
I dont like the mix of where dependend properties are calculated e.g: Signal.IsInteger => Done is parsing (and only dependend on one value which is incomplete) Signal.InitialValue => Done in ParsingSignal Message.IsMultiplexed => Done in (Immutable)Message (as calculated property) Signal Motorola/Msb etc stuff => Aliases for ByteOrder only done through extension methods

=> my favorite solution would be doing all that in Message or Signal Build.
TryFixExtendedId is now only used in UnitTests
CycleTime is non nullable with default being 0 i would like it to be -1 because its an int. Or change it to uint. Also CycleTime should also depend on "GenMsgSendType" being in any way cyclic. (might be an issue with custom properties aswell as the cycle time is alway added too all messages if set
Add a ///Summary to MessageId.Value and DbcId explaining the values as its unclear which is which without seeing the code
I dont like some classes in model having enums defined at the bottom. i would consider moving them to an enums file
Receiver in Signal is not properly readonly
I would create an MultiplexingParsing object containing the current string the role and the new string or object for extended stuff which is part of the parsingSignal. and then i still dont see a good way to hooking into CreateMessage for building the MultiplexingInfo across as Signals. The final object for multiplexing must exist in each signal before an individual signals multiplexing is parsed or the signal parsing must be done in a specific order

Adhara3 commented 2 months ago

I thought we were talking about design. Anyway:

Moved every class in its own file
Motorola/Msb as Extension was just a way to allow accessing the same info in two different but equivalent readable ways i.e. .Msb() and .Motorola(). ByteOrder to me was very hard to remember what 0 or 1 stand for. I added an enum, se if you like more. In this scenario extension methods are the best solution to me to avoid cluttering the class with tons of properties.
Removed anyway
Current behavior replicates CANDb++
Removed DbcId to avoid confusion
See point 1
Has to be refactored anyway to contain nodes, not strings. Noted.

Abou tpoint 8. Out of scope here, IMHO. I agree with you and your idea of having a smart object somewhere to pack&unpack structured multiplexed data,

I would not put that into Dbc which should stay a browsable file representation
I would tackle this aas parte of the data handler, i.e. Packer replacement

My idea is to have something like (psudo)

var reader = message.GetReader();
byte[] data = can.ReadFrame();

reader.Read(data, (signal, value) => {
  // The reader has been built to dinalycally adapt to message, it knows about muxors and 
  // so here we get a callback for each signal actually in the message based on muxor values
});

Today to call Packer.Unpack(data, signal) we need to have a way to have a list of signals which is dynamic depending on muxors, so we need to expose the muxors and their values and the combinations, instead the above approach allows to not expose the logic to the user, we keep it internal. I really want to point out that I am willing to solve the issue, the problem for me is scheduling, I can't work full time on DBCParser so we have to set priorities and do one step at time so that we do not get mad with merging and code reviews.

So, let's first release a version with the immutable part that fixes some issues and improves the code, small changes, a little step:

global/root properties
messages as (readonly) dictionary
the immutable class provides types separation (so you get some performance improvement, e.g. IsMultiplexed is faster now)
we add your basic advanced multiplexing which is not solving but at least the info is exposed

This won't be the final version, won't fix and solve everything but at least we close something and move forward. This is also the only sustainable way for me to manage this project in the context of my main professional activity.

To start with something on the new Pack/Unpack instead of messing around with the Dbc class we could create separate object that you could start using and testing and we use that as a sandbox.

If this approach does not suit your needs, please @EFeru make @Uight a collaborator so that he can go on on his own.

Peace A

Adhara3 commented 2 months ago

My proposed timeline:

Let's first release what we have now on main branch
then we merge multiline support on master (if @Uight will) and release just that big feature
finally we create a branch for milestone API 2.0 in which I merge immutability, @Uight puts ext mux and then we see.

Can we agree? Cheers A

Uight commented 2 months ago

@Adhara3 i can agree but you might want to do:

make a release now (1.7) main feature being byte based packing/unpacking with the bugfixes we had (signal parsing error and whatelse?)
bring global properties (#70) in and fix (#81) for another release (1.8)
then we can see if multiline support works (ist a poc after all. It seems to work but i have no idea how many edge cases there are). We can also skip that here and go to 4.
merge API 2.0 base implementation as a 2.0 release
add extended multiplexing and message packing/unpacking with multiplexing support which is another API change (in this case more of a update)

(3 could be pretty much done whenever it fits best)

Adhara3 commented 2 months ago

Before point 4, especially before releasing API 2 we need to see the code and further discussing on how to read write.

Uight commented 1 month ago

So recap targets for API 2.0 are:

Use immutable classes as output type but name them normal and makle the current classes (editable) internal
Move Packer from being static to an actual object by introducing a method like (dbc.CreatePacker) which internally creates a efficient class for packing/unpacking (e.g. using dictionarys and not list as in the public api).
introduce extended multiplexing
create message packing/unpacking possibilities with extendended multiplexing in the packer/unpacker
Split custom property assign from specific (explicitly assigned to a node/message/signal) and "global" (in lack of another word) as described in #93. (i mentioned that aswell in the ticked where the global custom properties (not assigned to anything) where discusses)

Unsure about #92 and #91 for now..... (all other current tickets should be solved with 2.0)

What else? or more specifics?

(And then parser rewrite after that?)