HPI-Information-Systems / Metanome

The source repository of the Metanome tool
metanome.de
Apache License 2.0
171 stars 62 forks source link

Want change in Normalization algorithm #405

Closed s1knd closed 3 years ago

s1knd commented 5 years ago

Normalize algorithm, it will take a dataset (represent table with columns and data) and then the final output it will check the table if it is in BCNF normal form or not. If it is NOT in BCNF it will produce an output that is a new schema of the table (decomposed of two or more tables) that is in BCNF with a primary key for each of the new tables. If the dataset is already in BCNF normal form It will just produce a primary key for the table.

As you can see the algorithm has 7 steps: (the highlighted step is the step I want to change as I will explain later) 1—FD discovery: here it uses a specific algorithm name HyFD to discover and extract all the functional dependencies FDs

2—Closure calculation : this is to calculate the closure of the given FDs from step 1, this step is necessary to extract the keys of the table and check violations of BCNF.

3—key derivation: here they use a specific algorithm to derive the keys named DUCC

4—violating FD identification: after having the keys and the functional dependencies, this step the algorithm will check each functional dependency and identify the functional dependencies that violate BCNF. If there are violating FDs it will go to step 5 then 6 then repeat 3,4 until there are no violations then finally step 7. If there are NO violating FD it will go to step 7 directly. I want to change this and make much simpler as I will explain later.

5—violation FD selection: if there are violating FD this step will select specific FDs

6—schema decomposition: in this step it will decompose the data set to BCNF

7—Primary key selection: this step is to select a primary key for the table

The changes I want to make:

Mainly I want the algorithm to stop at step 4 and ignore steps 5,6,7 and do NOT execute them. Only display a message if the dataset is BCNF or not in BCNF. After you finish the changes and you test it with metanome with the dataset of metanome and every thing is correct, I want you to send me the jar file ( and also the source code)so I can run it in metanome in my computer

How to change: I want to change step 4 in the algorithm because I want the algorithm to check the FDs one by one, if it finds a violating FD it should exit and print a message on the screen “the table violates BCNF normal form”, and do NOT check the rest of the FDs, just print the message and end the program (do not execute steps 5,6,7). Otherwise it will continue to check the rest of the FDs and when there is no violating FD in all FDs it will exit and print a message on the screen “the table is in BCNF normal form” and also end the program (do not execute steps 5,6,7).

Can you help in this change

s1knd commented 5 years ago

hello sir

thorsten-papenbrock commented 5 years ago

For that, you don't need to change the algorithm: Run the algorithm and if the result is the same as the input relation, i.e., if there is only one output relation, “the table is in BCNF normal form”; otherwise, “the table violates BCNF normal form”.

If you really need that change, got to: https://github.com/HPI-Information-Systems/metanome-algorithms/blob/master/Normalize/src/main/java/de/metanome/algorithms/normalize/Normi.java#L342 and replace the test with: if (violatingFds.isEmpty()) System.out.println(“the table is in BCNF normal form”); else System.out.println(“the table violates BCNF normal form”); return;

s1knd commented 5 years ago

Thank you so much Sir.

But Now i just need last help and again i will be very thankful to if you give me some to solve my that problem.

* Need Small change***

In Normalization algorithm Schema.java has the method getViolatingFds. The algorithm of getViolatingFds is in the following picture: https://imge.to/i/lcivA

As you see in the picture , the method checks the FDs in a loop one by one, it also checks other things : from line 10 to line 14 it checks stuff for primary key and foreign key Which I do not need and I do not want it to check so the changes in this method I want:

  1. Do NOT execute from line 10 to line 14 ( not to perform the checking)
  2. If the algorithm finds one violating FD it should put it in list and exit the loop so the loop have one violating FD. Otherwise if the FD is not violating BCNF the loop should continue to check the rest of the FDs. This change can be done as the following :

In simple if algorithm found one violating FD it stop their and show the message and terminate Not check rest of fds Violating or not. Whenever algorithm found one violating fds it just terminate immediately.

Just this small change i want i hope it will not take too much time of your. Please Help me Sir i am waiting for your reply.. Thanks

thorsten-papenbrock commented 5 years ago

You can find the function from the pseudo code here: https://github.com/HPI-Information-Systems/metanome-algorithms/blob/master/Normalize/src/main/java/de/metanome/algorithms/normalize/structures/Schema.java#L427

To not execute the lines 10 to 14, you can remove the lines 434 to 444 from the actual code. I don't know which other effects this might have in the algorithm, but if you thought this through and need it, this removal should do the trick.

The loop of the pseudo code is actually implemented as a parallel task, i.e., the fds are checked in parallel rather than one after another. So its not that easy to simply exit the loop. You could remove the parallelization and replace the line 383 with the content of the function in line 427. Then it should be easy to simply break the loop when then first violation is found.