dotnet / machinelearning-samples

Samples for ML.NET, an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
4.49k stars 2.68k forks source link

'home-depot-sentence-similarity.csv' is missing #982

Open b4naki opened 1 year ago

b4naki commented 1 year ago

in the sentence similarity project the path var dataPath = Path.GetFullPath(@"..\..\..\..\Data\home-depot-sentence-similarity.csv"); does not exist.

DarrenTweedale commented 1 year ago

I think this file can be downloaded from https://www.kaggle.com/competitions/home-depot-product-search-relevance/data download train.csv.zip, extract, and then rename the csv to home-depot-sentence-similarity.csv and place into the data folder

b4naki commented 1 year ago

I think this file can be downloaded from https://www.kaggle.com/competitions/home-depot-product-search-relevance/data download train.csv.zip, extract, and then rename the csv to home-depot-sentence-similarity.csv and place into the data folder

Thank you this worked.

Symbai commented 1 year ago

Is there a way to download this without entering a phone number?!

wushifeng commented 10 months ago

maybe: 1 download data home-depot-product-search-relevance.zip from https://www.kaggle.com/competitions/home-depot-product-search-relevance/data 2 extract train.csv.zip and product_descriptions.csv.zip to Dir Data 3 use code below to generate home-depot-sentence-similarity.csv

using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;

namespace SentenceSimilarity
{
    internal class GenData
    {
        //  id product_uid product_title search_term relevance
        //  2   100001  Simpson Strong-Tie 12-Gauge Angle   angle bracket   3
        public class HomeDepot
        {
            [LoadColumn(0)]
            public int id { get; set; }

            [LoadColumn(1)]
            public int product_uid { get; set; }

            [LoadColumn(2)]
            public string product_title { get; set; }

            [LoadColumn(3)]
            public string search_term { get; set; }

            [LoadColumn(4)]
            public string relevance { get; set; }
        }

        // https://learn.microsoft.com/en-us/dotnet/api/microsoft.ml.custommappingcatalog.custommapping?view=ml-dotnet
        [CustomMappingFactoryAttribute("product_description")]
        private class ProdDescCustomAction : CustomMappingFactory<HomeDepot,   CustomMappingOutput>
        {
            // We define the custom mapping between input and output rows that will
            // be applied by the transformation.
            public static void CustomAction(HomeDepot input, CustomMappingOutput
                output) => output.product_description = prodDesc[input.product_uid.ToString()];

            public override Action<HomeDepot, CustomMappingOutput> GetMapping()
                => CustomAction;
        }
        // Defines only the column to be generated by the custom mapping
        // transformation in addition to the columns already present.
        private class CustomMappingOutput
        {
            public string product_description { get; set; }
        }

        static Dictionary<string, string> prodDesc = new Dictionary<string, string>();

        static void Main(string[] args)
        {
            var mlContext = new MLContext(seed: 1);

            var DataPath = Path.GetFullPath(@"..\..\..\..\Data\product_descriptions.csv");
            {
                IDataView dv = mlContext.Data.LoadFromTextFile(DataPath, hasHeader: true, separatorChar: ',', allowQuoting: true,
                    columns: new[]      {
                        new TextLoader.Column("product_uid",DataKind.String,0),
                        new TextLoader.Column("product_description",DataKind.String,1)
                    }
                  );
                foreach (var row in dv.Preview(maxRows: 15_0000).RowView)
                {
                    string uid="", desc="";
                    foreach (KeyValuePair<string, object> column in row.Values)
                    {
                        if (column.Key == "product_uid")
                        {
                            uid = column.Value.ToString();
                        }
                        else
                        {
                            desc= column.Value.ToString();
                        }
                    }

                    prodDesc[uid] = desc;
                }
            }

            DataPath = Path.GetFullPath(@"..\..\..\..\Data\train.csv");
            IDataView dataView = mlContext.Data.LoadFromTextFile<HomeDepot>(DataPath, hasHeader: true, separatorChar: ',', allowQuoting: true);
            var preViewTransformedData = dataView.Preview(maxRows: 5);
            foreach (var row in preViewTransformedData.RowView)
            {
                var ColumnCollection = row.Values;
                string lineToPrint = "Row--> ";
                foreach (KeyValuePair<string, object> column in ColumnCollection)
                {
                    lineToPrint += $"| {column.Key}:{column.Value}";
                }
                Console.WriteLine(lineToPrint + "\n");
            }

            var pipeline = mlContext.Transforms.CustomMapping(new ProdDescCustomAction().GetMapping(), contractName:  "product_description");
            var transformedData = pipeline.Fit(dataView).Transform(dataView);

            //mlContext.ComponentCatalog.RegisterAssembly(typeof(IsUnderThirtyCustomAction).Assembly);
            Console.WriteLine("save file");
            using FileStream fs = new FileStream(Path.GetFullPath(@"..\..\..\..\Data\home-depot-sentence-similarity.csv"), FileMode.Create);
            mlContext.Data.SaveAsText(transformedData, fs, schema: false, separatorChar:',');
        }
    }
}

After these operation, you can see the data file home-depot-sentence-similarity.csv.

Symbai commented 10 months ago

maybe: 1 download data home-depot-product-search-relevance.zip from https://www.kaggle.com/competitions/home-depot-product-search-relevance/data

Reposting the link is not a help. The problem of phone number is required still exist. I cannot download it without logging in. I dont have a google account (creating one wants my phone number) same others. Even creating a Kaggle account is asking for my phone number.

wushifeng commented 10 months ago

Here is the processed data file. home-depot-sentence-similarity.zip