Open b4naki opened 1 year ago
I think this file can be downloaded from https://www.kaggle.com/competitions/home-depot-product-search-relevance/data download train.csv.zip, extract, and then rename the csv to home-depot-sentence-similarity.csv and place into the data folder
I think this file can be downloaded from https://www.kaggle.com/competitions/home-depot-product-search-relevance/data download train.csv.zip, extract, and then rename the csv to home-depot-sentence-similarity.csv and place into the data folder
Thank you this worked.
Is there a way to download this without entering a phone number?!
maybe: 1 download data home-depot-product-search-relevance.zip from https://www.kaggle.com/competitions/home-depot-product-search-relevance/data 2 extract train.csv.zip and product_descriptions.csv.zip to Dir Data 3 use code below to generate home-depot-sentence-similarity.csv
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;
namespace SentenceSimilarity
{
internal class GenData
{
// id product_uid product_title search_term relevance
// 2 100001 Simpson Strong-Tie 12-Gauge Angle angle bracket 3
public class HomeDepot
{
[LoadColumn(0)]
public int id { get; set; }
[LoadColumn(1)]
public int product_uid { get; set; }
[LoadColumn(2)]
public string product_title { get; set; }
[LoadColumn(3)]
public string search_term { get; set; }
[LoadColumn(4)]
public string relevance { get; set; }
}
// https://learn.microsoft.com/en-us/dotnet/api/microsoft.ml.custommappingcatalog.custommapping?view=ml-dotnet
[CustomMappingFactoryAttribute("product_description")]
private class ProdDescCustomAction : CustomMappingFactory<HomeDepot, CustomMappingOutput>
{
// We define the custom mapping between input and output rows that will
// be applied by the transformation.
public static void CustomAction(HomeDepot input, CustomMappingOutput
output) => output.product_description = prodDesc[input.product_uid.ToString()];
public override Action<HomeDepot, CustomMappingOutput> GetMapping()
=> CustomAction;
}
// Defines only the column to be generated by the custom mapping
// transformation in addition to the columns already present.
private class CustomMappingOutput
{
public string product_description { get; set; }
}
static Dictionary<string, string> prodDesc = new Dictionary<string, string>();
static void Main(string[] args)
{
var mlContext = new MLContext(seed: 1);
var DataPath = Path.GetFullPath(@"..\..\..\..\Data\product_descriptions.csv");
{
IDataView dv = mlContext.Data.LoadFromTextFile(DataPath, hasHeader: true, separatorChar: ',', allowQuoting: true,
columns: new[] {
new TextLoader.Column("product_uid",DataKind.String,0),
new TextLoader.Column("product_description",DataKind.String,1)
}
);
foreach (var row in dv.Preview(maxRows: 15_0000).RowView)
{
string uid="", desc="";
foreach (KeyValuePair<string, object> column in row.Values)
{
if (column.Key == "product_uid")
{
uid = column.Value.ToString();
}
else
{
desc= column.Value.ToString();
}
}
prodDesc[uid] = desc;
}
}
DataPath = Path.GetFullPath(@"..\..\..\..\Data\train.csv");
IDataView dataView = mlContext.Data.LoadFromTextFile<HomeDepot>(DataPath, hasHeader: true, separatorChar: ',', allowQuoting: true);
var preViewTransformedData = dataView.Preview(maxRows: 5);
foreach (var row in preViewTransformedData.RowView)
{
var ColumnCollection = row.Values;
string lineToPrint = "Row--> ";
foreach (KeyValuePair<string, object> column in ColumnCollection)
{
lineToPrint += $"| {column.Key}:{column.Value}";
}
Console.WriteLine(lineToPrint + "\n");
}
var pipeline = mlContext.Transforms.CustomMapping(new ProdDescCustomAction().GetMapping(), contractName: "product_description");
var transformedData = pipeline.Fit(dataView).Transform(dataView);
//mlContext.ComponentCatalog.RegisterAssembly(typeof(IsUnderThirtyCustomAction).Assembly);
Console.WriteLine("save file");
using FileStream fs = new FileStream(Path.GetFullPath(@"..\..\..\..\Data\home-depot-sentence-similarity.csv"), FileMode.Create);
mlContext.Data.SaveAsText(transformedData, fs, schema: false, separatorChar:',');
}
}
}
After these operation, you can see the data file home-depot-sentence-similarity.csv.
maybe: 1 download data home-depot-product-search-relevance.zip from https://www.kaggle.com/competitions/home-depot-product-search-relevance/data
Reposting the link is not a help. The problem of phone number is required still exist. I cannot download it without logging in. I dont have a google account (creating one wants my phone number) same others. Even creating a Kaggle account is asking for my phone number.
Here is the processed data file. home-depot-sentence-similarity.zip
in the sentence similarity project the path
var dataPath = Path.GetFullPath(@"..\..\..\..\Data\home-depot-sentence-similarity.csv");
does not exist.