fragmatyc / SDFL

Simple Data Fix Language
1 stars 2 forks source link

CodeSplitter breaks LSP #4

Open fragmatyc opened 8 years ago

fragmatyc commented 8 years ago

Review the code splitter package as it currently breaks the Liskov Substitution Principle.

This package needs to be refactored.

Cpt-xx commented 8 years ago

Okay... I will take a look later to see what can be done.

fragmatyc commented 8 years ago

Here are more details...

CommentsWithdrawalCodeSplitter is used to remove comments. I don't think it should be considerred as a splitter... The purpose of a splitter is to encapsulate the logic behind splitting the code into smaller chunks.

For example, when parsing the code in the SDFLInterpretter, we first remove the comments using the CommentsWithdrawalCodeSplitterParameters class, then we configure the SimpleDelimiterCodeSplitter class to split on carriage returns to remove the new line characters. After, it's configured to split on the semi-colon character to split the cleaned (no comments, no carriage) code. Once each statement code is parsed, it's tokenized (again, with SimpleDelimiterCodeSplitter, but configured to split on spaces).

Both code splitters inherit from CodeSplitter, but the implementation cannot be used independently, thus telling me it's either implemented the wrong way, or they don't share enough information to be both "Code Splitters". Perhaps a higher level class should be created representing both the CodeSplitter class and the CommentsWithdrawalCodeSplitter.

Cpt-xx commented 8 years ago

CommentsWithdrawalCodeSplitterParameters seems to me to be a filter (in Java8 terminology) and SimpleDelimiterCodeSplitter sounds like a map and I think that is what we should look into.

Now I do fail to see what this has to with the LSK principle, but that may be my still limited knowledge of the code.

The first question on other projects would be "could you supply me with a testcase proving the problem?" which would fit the intended TDD way of working. So: could you supply me with a testcase proving the problem? And some instructions on how to run the test in the current setup. As I wrote: I'm used to working with the tests being part of the development project :smile:.

fragmatyc commented 8 years ago

LSP says that classes under the same branch of abstraction should be usable using the same reference type, no matter the implementation, meaning in this case that I should be able to have a reference of CodeSplitter and use either CommentsWithdrawalCodeSplitter or SimpleDelimiterCodeSplitter (i.e.: they have to share a common state or behaviour) which is false in this case.

CommentsWithdrawalCodeSplitter : Takes an input string like "this is a regular string /this is comments/ this is the end". You set the delimiters in the parameters to "/" and "/" using a DelimiterPair. What it does is take the first delimiter (i.e.: "/") and find the next usable delimiter (i.e.: "/"). By usable, I mean that it has to make distinction between code and comments, like this string: "test with /* comment = \"with delimiter/ inside\" and outside / a string".

(see: Test suite)

SimpleDelimiterCodeSplitter: This takes an input string like "Line of code 1;\nLine of code 2". You first set the delimiter to "\n" and split the string to isolate each line. Then you put them back together and split again with the delimiter ";" to isolate each code statement. You end up with ["Line of code 1", "Line of code 2"].

(See: Test suite)

Both classes are backed by the test suites. The common behavior/state between the two classes is be able to:

Regarding you test suite question, please refer to the ticket about it.

Cpt-xx commented 8 years ago

Aha... I think what LSP is talking about is class-structure, not class behaviour.

How I see it is that LSP is about for instance a List. We all know that a Cow, Bull and Calf are all Bovine, however a Calf is a small Cow, should a Calf be allowed in the List or not? They are all under the same level of abstraction (i.e. instance of Bovine) however the class Cow has specialized methods (e.g. giveMilk()) which the Calf doesn't have (it may have other methods such as succleMilk(Udder)).

So coming back to the example (which I haven't looked at yet): it seems to me that there is a common demoninator: CodeSplitter, but that there are two sub-classes. These sub-classes are Splitters taking 1 delimiter and Splitters taking two delimeters, which share the common method split(). After splitting they put the string back together again. And what we got here is a mix-up of terminology (or if you like a mix-u of responsibility): what you have called SimpleDelimiterCodeSplitter and CommentsWithdrawalCodeSplitter do two things: filtering and splitting. What I propose to do is to define a Filter interface (specializing in two sub-interfaces), which has a single method: filter(). And the two specializing filters take either a single argument or the DelimiterPair. Then we have the splitter which splits the string using some other delimiter. So the Splitter interface has at least 1 single method: split(). The implementations can be a DefaultSplitter (which simply uses a default character as delimiter) and a specialized NonDefaultSplitter, which has an additional method: setDelimiter(char) in which the delimiter can be set. Because the upper interface has a single method, we can use this interface in Lamba-expressions.

What are your thoughts?

Cpt-xx commented 8 years ago

I'd like to try my hands at this, but I think you'd need to assign this ticket to me (just to see if this assigning of issues works and how), currently you are the owner.

Cpt-xx commented 8 years ago

It seems that what I would call a filter (i.e. something that filters out unwanted stuff, such as text in comments) is not called a filter as used in Java streams. The filters I have in mind are filters in the sense of the pipes and filters paradigm (e.g. ps -elF | grep s where the output of ps is piped through to grep where it filters out whatever you want).

I will have to find out the terminology as used by Streams in Java and see how it is used in a Lambda expression type of construction.

fragmatyc commented 8 years ago

You are assigned :)

Cpt-xx commented 8 years ago

Just to give you a heads up:

I've now created an interface for the filters (or rather, what I call a filter). This interface extends Function<T, R> where I specify the T and R to be String. The purpose of this interface is twofold. Firstly it pins down the function to accept a String and returning a String. Using Function<T, R> opens the road towards Lambdas and using the implementations in Streams, as Function is a @FunctionalInterface.

I've also created 3 implementations for this interface:

These have been created in a new package com.sdlf.code.filter.

Next I have created another interface (this time in com.sdfl.splitter.simpledelimiter) called LinesSeparator. Again, this interface extends Function<T, R> but this time for Function<String, List>. The only implementation at this time is DefaultLinesSeparator, which takes a SimpleDelimiterCodeSplitterParameters for instantiation parameter. The method apply will firstly split the input String using the delimiter from the instantiation parameter (String.split(separators.getDelimiter()) is used), after that, if required by isKeepDelimiter, it will use the above SingleDelimiterAppendingCodeFilter to append the delimiter back for each entry.

I am now writing tests for these implementations and I will then try and find your tests for the original mixed responsibility classes to see if these new implementations will produce correct results.

Cpt-xx commented 8 years ago

I've got an issue with your implementation:

fragmatyc commented 8 years ago

Hi Cor, I don't get you second point. The intent of the last test of CommentsWithdrawalCodeSplitterTest is to test the following scenario:

insert into MY_TABLE using template 
    "String value with comment start /* hidden in it" -> MY_COLUMN /* With a comment after */;

That is the point of this.splitter.getParameters().addDontSplitBetweenTwo("\""); which says that we don't want to split comments if the comment delimiter is (in this case) in a String (or perhaps another comment!)

Cpt-xx commented 8 years ago

Sorry I missed the usecase of this one. I'll see what I can do to resolve this.

fragmatyc commented 8 years ago

No worries :)

Cpt-xx commented 8 years ago

Okay, I'm trying to make sense of the CodeSplitterParameters class. Basically it is a list of pairs of unqualified delimiters. The pair of delimiters denoting a comment (e.g. "/" and "/") have the same meaning and contents as the pair of delimiters denoting a String (e.g.""" and """). So how can I make a informed choice between the two?

As there are clear distinctions between a delimiter pair denoting a comment and a pair denoting a String, I would propose to either add a discriminator (in the form of an enum as there is a limited number of possibilities) or to make clear subclasses of the pair (so a CommentsDelimitersPair and a StringDelimitersPair) so the processing methods can make an informed and unambiguous choice between them. Using an enumeration will create the possibility of using a switch and using subclasses will enable polymorphic method calls.

So, please help me make a decision.

fragmatyc commented 8 years ago

Do it as you wish :)

The way I did it at first was to add a "don't split between two delimiters" property. I don't think it's the best solution as we might want to avoid splitting between more than one pair of delimiters. You make the call! I think your proposition of an enum is good, but please look at the class containing sdfl syntax. I'm using reflection on the class instead of enum for inheritance (enum can't extend).

The point is to think and be future proof, isn't it?

Cpt-xx commented 8 years ago

Okay... I'll try and do a proposal (working with unit tests etc.).

However I do want to have some things cleared up/asked:

I will have to think up a nice and clear way of filtering the comments and such out of the input text. The fact that a delimiters appearing in text quotes (and presumably the other way around as well) need to be ignored does throw a bit of a challenge at me. But I will find some nice way I'm sure. It might take some time though...

fragmatyc commented 8 years ago

We surely will need to escape some characters, like the double quotes in a String. Regarding the splitting of code with Strings, I agree it's a bit complex. That is why I had originally created a separated implementation of the code splitter. Perhaps a simple and effective solution could be to remove the strings and put them back after the code is cleaned? It could potentially accelerate the process if we start by removing the Strings instead of removing the new line chars?

Cpt-xx commented 8 years ago

A heads up: I'm working on a solution where any delimiter between a string delimiter is ignored. It is looking good so far, I'm working on (close to) 100% unit test coverage. After that I'll see if that logic works for delimiters between comment delimiters as well (it should, logically speaking).

After THAT we can see if we can adopt this way of working and modify the existing code to use this new material.