CBielstein / APRSsharp

APRS# - Modern APRS software for the amateur radio community
MIT License
12 stars 5 forks source link

Use Regex for Packet Parsing #82

Closed CBielstein closed 3 years ago

CBielstein commented 3 years ago

Description

This PR switches semi-implemented packet parsing from substrings to regex. This should allow a much more maintainable and reusable structure for parsing packets.

The regular expressions are structured in a way to allow reuse of individual components of the information field, which helps minimize repeated code.

If accepted, this closes #70.

Changes

Validation

CBielstein commented 3 years ago

A question, when it comes to the GroupColection objects in the Packet. cs and the other file, I have seen you have put different indices for the Groups in the match.Groups[index].Value, is this informed by the representation of the actual information that will be returned? Was a bit confused there

@eddywine Great question. The short answer is that it comes from the structure of the regular expression itself.

The longer answer is that in a complex regex string, there can be multiple sections of the regular expression which may match different strings. Take a look at the expressions in RegexStrings.cs. Each section in parenthesis is a "group" of matches, which allows us to view the separate pieces of a packet info field.

For example, the regex for maidenhead gridsquare plus optional symbols looks like this ([a-zA-Z0-9]{4,8})(.{2})?. This tells us that we're looking for:

  1. 4 to 8 alphanumeric characters: ([a-zA-Z0-9]{4,8})
  2. Optionally followed by any 2 characters: (.{2})?

When we compare this to a string which matches our criteria, .NET's regex parsing gives us the three things we're very likely to care about:

  1. The full string that matches (e.g. JN18du\L)*
  2. The first group from our regex: the 4 to 8 alphanumeric characters: JN18du
  3. The second group from our regex: the last two characters: \L

Those three above are the match groups. This allows us to see the full match or break the string up for further processing.

I tried to comment the corresponding match groups in to the RegexStrings.cs file for some clarity, though they're still somewhat confusing as they're far from where they're actually used. In a future refactor of the much-too-large Packet.cs, I'm hoping to bring the regex strings closer to where they are used to try and make it more clear.

*Of note, the string JN18du\L specifies the location of the Eiffel Tower with an icon of a lighthouse. JN18du is the maidenhead gridsquare of the tower, \L is the APRS icon code for lighthouse.