LoLei / spmf-py

Python SPMF Wrapper 🐍 🎁
GNU General Public License v3.0
63 stars 18 forks source link

[help] [frequent itemset mining] Understanding output with negative value #9

Open SantoshKumarRaju opened 1 year ago

SantoshKumarRaju commented 1 year ago

Looking for clarity on the output of FP Growth Algorithm. I am doing frequent itemset mining and various times I see negative values in the output itemsets even though my data set doesn't contain negative values. Curious as to how to interpret this negative value.

Below is an example:

from spmf import Spmf
input_example_list = [
    "1, 3, 4",
    "2, 3, 5",
    "1, 2, 3, 5",
    "2, 5",
    "1, 2, 4, 5"
]

spmf = Spmf("FPGrowth_itemsets",
            input_direct=input_example_list,
            input_type="text",
            output_filename="C:\\spaces\\igt_eye\\trials\\itemset\\output.txt",
            arguments=[0.4, 3, 3],
            spmf_bin_location_dir="\\site-packages\\spmf\\")
spmf.run()
print(spmf.parse_output())

This produces the following output:

=============  FP-GROWTH 2.42 - STATS =============
 Transactions count from database : 5
 Max memory usage: 8.0 mb 
 Frequent itemsets count : 9
 Total time ~ 4 ms
===================================================
Post-processing to show result in terms of string values.
Post-processing completed.

[
['-2 1 4 #SUP: 2'], 
['-2 3 5 #SUP: 2'], 
['3 2 5 #SUP: 2'], 
['-2 3 2 #SUP: 2'], 
['-2 1 3 #SUP: 2'], 
['-2 1 2 #SUP: 2'], 
['-2 1 5 #SUP: 2'], 
['1 2 5 #SUP: 2'], 
['-2 2 5 #SUP: 4']
]

In the above output, I am not sure how to interpret this negative value (-2) in the itemset. Any pointers/hints from the community?

LoLei commented 1 year ago

You could first try the results of the SPMF Jar itself, if it replicates the results, the issue lies there. spmf-py is just a wrapper for the Jar.On Jan 9, 2023, at 8:10 AM, SantoshKumarRaju @.***> wrote: Looking for clarity on the output of FP Growth Algorithm. I am doing frequent itemset mining and various times I see negative values in the output itemsets even though my data set doesn't contain negative values. Curious as to how to interpret this negative value. Below is an example: from spmf import Spmf input_example_list = [ "1, 3, 4", "2, 3, 5", "1, 2, 3, 5", "2, 5", "1, 2, 4, 5" ] spmf = Spmf("FPGrowth_itemsets", input_direct=input_example_list, input_type="text", output_filename="C:\spaces\igt_eye\trials\itemset\output.txt", arguments=[0.4, 3, 3], spmf_bin_location_dir="\site-packages\spmf\") spmf.run() print(spmf.parse_output()) This produces the following output: ============= FP-GROWTH 2.42 - STATS ============= Transactions count from database : 5 Max memory usage: 8.0 mb Frequent itemsets count : 9 Total time ~ 4 ms Post-processing to show result in terms of string values. Post-processing completed. [ ['-2 1 4 #SUP: 2'], ['-2 3 5 #SUP: 2'], ['3 2 5 #SUP: 2'], ['-2 3 2 #SUP: 2'], ['-2 1 3 #SUP: 2'], ['-2 1 2 #SUP: 2'], ['-2 1 5 #SUP: 2'], ['1 2 5 #SUP: 2'], ['-2 2 5 #SUP: 4'] ] In the above output, the bold itemsets have negative values (-2), I am not sure how to interpret this negative value in the itemset. Any pointers/hints from the community?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

SantoshKumarRaju commented 1 year ago

Hello LoLei, Just did as you suggested, and the jar file is giving only two itemsets 2 3 5 #SUP: 2 1 2 5 #SUP: 2 whereas the spmf-py is giving me additional itemsets with negative values. Not sure what is happening here !