Closed horvathr closed 5 years ago
A Doc2Vec type model factorizes a matrix with non zero elements implicitly. This matrix is a document-word (document-term) matrix where rows are documents and columns denote how many times a word appeared in a document. Doc2Vec does this decomposition by doing a form of noise-contrastive estimation called negative sampling. This technique requires zero elements of the target matrix that you can sample from each row as negative samples. Graph2Vec factorizes a matrix where the documents are graphs and the words in the documents ar WL features. Your document-term matrix has identical rows with positive elements because of your experimental design. Because of how the gradients interact - similar features pull graphs closer but the negative sampling pushes them away (in this case these are not good negative samples, because of your experimental design you sample factual features) and the graphs would be laid out uniformly in the embedding space around the origin.
Originally I ran diagnostics on this tool with simulated Watts-Strogatz graphs with heterogeneous rewiring probability. Based on the learned representations I was able to predict the rewiring probability for the nodes. In your case, I would generate a few more classes of graphs and add some randomness within the classes. If this is a real world project I am happy to do consulting.
Thank you so much for your explanation. We will continue to test our data by adding more classes of graphs. We will contact you if we require further consulting. Again, thank you for providing us with some insights.
On Sat, Oct 26, 2019 at 11:14 PM Benedek Rozemberczki < notifications@github.com> wrote:
A Doc2Vec type model factorizes a matrix with non zero elements implicitly. This matrix is a document-word (document-term) matrix where rows are documents and columns denote how many times a word appeared in a document. Doc2Vec does this decomposition by doing a form of noise-contrastive estimation called negative sampling. This technique requires zero elements of the target matrix that you can sample from each row as negative samples. Graph2Vec factorizes a matrix where the documents are graphs and the words in the documents ar WL features. Your document-term matrix has identical rows with positive elements because of your experimental design. Because of how the gradients interact - similar features pull graphs closer but the negative sampling pushes them away (in this case these are not good negative samples, because of your experimental design you sample factual features) and the graphs would be laid out uniformly in the embedding space around the origin.
Originally I ran diagnostics on this tool with simulated Watts-Strogatz graphs with heterogeneous rewiring probability. Based on the learned representations I was able to predict the rewiring probability for the nodes. In your case, I would generate a few more classes of graphs and add some randomness within the classes. If this is a real world project I am happy to do consulting.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/benedekrozemberczki/graph2vec/issues/15?email_source=notifications&email_token=AMLC34GFP6MO4UAK5RY4YYDQQUBQFA5CNFSM4JFFXZ22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECKVLUI#issuecomment-546657745, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMLC34ASWVAAQLVOIRODVK3QQUBQFANCNFSM4JFFXZ2Q .
We have used graph2vec to represent our sub-graphs as fixed length feature vectors for our project.
Using the default parameters, we entered 5 identical subgraphs (see below) but the features produced for the identical graphs (see attached csv file) are vastly different.
Also Kmeans clustering on the features above did not provide tightly clustered outputs for the identical subgraphs. Can you provide us with some insight as to what we may be doing wrong? We appreciate any suggestions you can provide. i Thank you so much Input
0.json
{"edges": [[0, 1], [0, 2], [0, 3], [3, 4], [3, 5], [3, 2], [3, 1]], "features": {"0": "65", "1": "30", "2": "30", "3": "67", "4": "30", "5": "30"}}
1.json
{"edges": [[0, 1], [0, 2], [0, 3], [3, 4], [3, 5], [3, 2], [3, 1]], "features": {"0": "65", "1": "30", "2": "30", "3": "67", "4": "30", "5": "30"}}
2.json
{"edges": [[0, 1], [0, 2], [0, 3], [3, 4], [3, 5], [3, 2], [3, 1]], "features": {"0": "65", "1": "30", "2": "30", "3": "67", "4": "30", "5": "30"}}
3.json
{"edges": [[0, 1], [0, 2], [0, 3], [3, 4], [3, 5], [3, 2], [3, 1]], "features": {"0": "65", "1": "30", "2": "30", "3": "67", "4": "30", "5": "30"}}
4.json
{"edges": [[0, 1], [0, 2], [0, 3], [3, 4], [3, 5], [3, 2], [3, 1]], "features": {"0": "65", "1": "30", "2": "30", "3": "67", "4": "30", "5": "30"}}
features
type,x_0,x_1,x_2,x_3,x_4,x_5,x_6,x_7,x_8,x_9,x_10,x_11,x_12,x_13,x_14,x_15,x_16,x_17,x_18,x_19,x_20,x_21,x_22,x_23,x_24,x_25,x_26,x_27,x_28,x_29,x_30,x_31,x_32,x_33,x_34,x_35,x_36,x_37,x_38,x_39,x_40,x_41,x_42,x_43,x_44,x_45,x_46,x_47,x_48,x_49,x_50,x_51,x_52,x_53,x_54,x_55,x_56,x_57,x_58,x_59,x_60,x_61,x_62,x_63,x_64,x_65,x_66,x_67,x_68,x_69,x_70,x_71,x_72,x_73,x_74,x_75,x_76,x_77,x_78,x_79,x_80,x_81,x_82,x_83,x_84,x_85,x_86,x_87,x_88,x_89,x_90,x_91,x_92,x_93,x_94,x_95,x_96,x_97,x_98,x_99,x_100,x_101,x_102,x_103,x_104,x_105,x_106,x_107,x_108,x_109,x_110,x_111,x_112,x_113,x_114,x_115,x_116,x_117,x_118,x_119,x_120,x_121,x_122,x_123,x_124,x_125,x_126,x_127 0,-0.002921980805695057,0.0019917855970561504,0.00042243595817126334,-0.0001146442664321512,0.0027856503147631884,-0.003612641477957368,0.0007139553781598806,-0.002218850189819932,0.0007622303673997521,-0.003667013719677925,0.00020051038882229477,0.002386653795838356,-0.0015244260430335999,-0.001942446338944137,0.0015865900786593556,-0.002430693479254842,-1.423046433046693e-05,0.00043223355896770954,-0.0004984369734302163,0.0011258882004767656,-0.0006631789728999138,-0.001834464492276311,-0.002684623934328556,0.0021268008276820183,0.0016442902851849794,0.002274706494063139,0.0024222894571721554,0.0014194073155522346,0.0023438141215592623,0.0015838627004995942,0.002310663228854537,0.0014568573096767068,0.000690316257532686,0.0004983388935215771,-0.0026711630634963512,0.00340675818733871,0.002079088008031249,-0.0013625413412228227,0.0018945640185847878,0.0011790600838139653,0.003815916134044528,0.0034210982266813517,0.0037433707620948553,-0.0022726887837052345,-0.0026500080712139606,-0.0010492472210898995,0.0013163144467398524,-0.0001258519769180566,-0.0017091833287850022,0.003472822019830346,-0.0036862220149487257,0.003828400745987892,-0.0016809114022180438,0.0032152864150702953,-0.0008184377802535892,0.0034521056804805994,0.0032005112152546644,-0.0024409648030996323,0.0028309114277362823,0.002729770727455616,-0.003585312981158495,-0.00024177387240342796,-0.0002953226794488728,0.0005578010459430516,-0.0013301002327352762,-0.00023178242554422468,-0.0010268031619489193,-0.002611540723592043,0.0016747136833146214,0.0037863797042518854,-0.0014814079040661454,-0.0009825780289247632,0.000286577211227268,0.00369463162496686,0.0003661931841634214,-0.0036635601427406073,0.0038319083396345377,0.00034779138513840735,-0.0029241384472697973,-0.00015080621233209968,0.00245654652826488,0.0012698960490524769,0.0038113698828965425,0.0020659039728343487,-0.002460563788190484,0.003330332925543189,0.000335091317538172,0.0027111624367535114,-0.0021673559676855803,0.003040780546143651,0.0014197180280461907,0.0025863349437713623,-0.0008595485123805702,0.0016730247298255563,-0.002238348824903369,-0.0014983880100771785,-0.002856674138456583,0.003900788491591811,-0.0029116503428667784,0.0008353788871318102,-0.003707862226292491,-0.003220281330868602,0.0038748173974454403,-0.0012597186723724008,-0.0022263021674007177,0.0032723527401685715,0.001983931753784418,-0.0009860624559223652,-0.0026594484224915504,0.0005092752980999649,-0.0010045311646535993,0.00043074158020317554,-0.0006234926404431462,-0.0036853901110589504,0.0033234185539186,-0.0001760138402460143,0.0009347024606540799,0.0006853214581497014,-0.0009750690078362823,-0.001720305997878313,0.0033709295094013214,0.0024223600048571825,-0.001860664109699428,-0.0033350486773997545,5.440079985419288e-05,-0.0017850360600277781,0.0006075493292883039,0.0017705451464280486 1,-0.0007899366319179535,0.0034763035364449024,-0.0031649558804929256,-0.0038582677952945232,0.0030589941889047623,0.0024385270662605762,0.0007023548823781312,0.002542045433074236,0.003084726631641388,-0.0021342230029404163,0.002349107526242733,-0.0007615322829224169,-0.0033563957549631596,-0.001407726900652051,0.0017020516097545624,0.0008983929292298853,6.217532063601539e-05,-0.0019879427272826433,-0.0017635654658079147,-0.002506680553779006,-0.0027478302363306284,-0.001141324290074408,-0.00326850195415318,-0.0014613013481721282,-0.0017065171850845218,0.0011920499382540584,-0.0012276453198865056,0.0024964376352727413,0.0026812769938260317,-0.001462650136090815,0.0023853315506130457,-0.0022144729737192392,0.0016651339828968048,-0.0033012968488037586,-0.0017995499074459076,-0.0007093038293533027,-0.0007944940007291734,-0.0008988109766505659,-0.0006396020180545747,0.0036315470933914185,0.0031063579954206944,0.003404709044843912,-0.0034592982847243547,-0.0023505527060478926,0.0009349206811748445,0.0008606857154518366,-0.0034494020510464907,-0.0030351970344781876,0.0020507730077952147,-0.0034043036866933107,-0.0013356079580262303,0.0026185866445302963,-0.0003209983406122774,0.0010972495656460524,-0.003353237872943282,0.0005406123818829656,9.852329094428569e-05,0.0011593784438446164,-0.001968847122043371,-0.0034255434293299913,-0.0021584834903478622,0.0038824898656457663,-0.002288225805386901,-0.0021955370903015137,-0.0008158088312484324,-0.001036758767440915,-0.0030485547613352537,-0.0029740005265921354,0.00041252037044614553,0.0008094724034890532,-0.0013014365686103702,0.002923062304034829,0.0028244403656572104,0.0015162983909249306,0.0033367632422596216,-0.0024903069715946913,-0.0021822594571858644,-0.0003872658417094499,0.0014111202908679843,0.003218378871679306,0.00031110228155739605,-0.0031358294654637575,-0.0004674529191106558,-0.0006635853787884116,0.002574484795331955,4.1514329495839775e-05,-0.002413139445707202,-0.0034133922308683395,-8.862825779942796e-05,0.00010387676593381912,0.0013719667913392186,-0.0002543768205214292,0.0024005521554499865,-0.0018111238023266196,0.0009683762909844518,0.0032069890294224024,-0.0037442264147102833,-0.0023161424323916435,-0.0002594651887193322,0.001383508089929819,0.0036754277534782887,0.001844267826527357,-0.0038253869861364365,-0.0007103432435542345,-0.0018077954882755876,0.0030866663437336683,-8.397615602007136e-05,-0.0033143775071948767,-0.0012993044219911098,-0.0032517346553504467,0.002442749449983239,-0.0012355140643194318,0.0037098878528922796,0.0011372631415724754,-0.00047043408267199993,-0.0038157363887876272,-0.0017348530236631632,0.003638728056102991,0.001819252036511898,-0.00027891306672245264,0.0023341928608715534,0.000704775273334235,-0.001470968360081315,-0.0021361587569117546,-0.003107558935880661,0.0007341455202549696,0.0002843852271325886,0.0025384246837347746 2,-0.003191772848367691,0.0001835404837038368,-0.0011708770180121064,-0.0014271646505221725,0.0017692336114123464,0.0017457834910601377,0.002994104754179716,0.0028854620177298784,0.001106327516026795,-0.0017014361219480634,-0.002501782262697816,-0.003046308644115925,0.002076556906104088,0.0007232814095914364,0.003546368330717087,-0.0031842486932873726,-0.00012119952589273453,0.0027201976627111435,0.0035334290005266666,0.0007723849266767502,0.003673312021419406,0.00033438167884014547,-0.002535406267270446,-0.0035727310460060835,-0.0018347441218793392,-0.0006416176911443472,-0.0006869172211736441,-0.002686522202566266,0.0034463251940906048,0.000578683044295758,-0.0034865231718868017,-0.0035254214890301228,-0.0027820246759802103,0.0027613919228315353,0.0012621080968528986,-0.0025481393095105886,-0.0011443900875747204,0.00013140295050106943,0.0005833808099851012,0.0010550469160079956,-0.0019496568711474538,0.003822379047051072,0.0028459352906793356,-0.0032757942099124193,0.0025608153082430363,0.0030413619242608547,-0.0005989212077111006,-0.0028994232416152954,0.00025810833903960884,1.2741512364300434e-05,-0.0011288640089333057,0.003362147370353341,-0.0033266739919781685,-0.0017139799892902374,0.0008058756939135492,-0.0030933453235775232,-0.0032492964528501034,-0.0017976377857849002,0.0018832412315532565,-0.003904148004949093,0.001164337620139122,0.0038968115113675594,0.0030564262997359037,0.0002058401150861755,-0.002198415109887719,0.0032414819579571486,0.003652922809123993,0.0016236837254837155,0.0038159661926329136,0.0016936019528657198,-0.001772071816958487,-0.00010917303734458983,-0.0014565670862793922,0.002417120849713683,-0.0005687113152816892,-0.0034190653823316097,-0.0007575388299301267,-0.001252891612239182,-0.0027461089193820953,-0.0023173403460532427,-0.0029359806794673204,-0.002850216580554843,-0.0017987043829634786,-0.0008257846347987652,-0.0010036969324573874,0.0014453422045335174,0.00341025716625154,-0.0032494780607521534,0.0027634731959551573,0.0017919924575835466,0.0022179163061082363,0.0024956485722213984,-0.0036200007889419794,-0.003190566785633564,-0.0038656918331980705,0.002872048644348979,-0.000748040562029928,0.0018736685160547495,0.0032249221112579107,-5.06550975387654e-07,0.000869029900059104,0.00019358645658940077,0.0020771438721567392,0.0028800060972571373,-0.0009861814323812723,0.0011081903940066695,-0.0009238282218575478,-0.0023052829783409834,-0.0035225225146859884,0.00229478906840086,0.0031650331802666187,-0.0007591700414195657,-0.0024068383499979973,-0.0011479462264105678,0.0004679410776589066,-0.000213331775739789,-0.002047076588496566,0.0023680971935391426,-0.0008420454687438905,0.0018676763866096735,-0.0028614168986678123,-0.002850909484550357,0.00029416586039587855,-0.0005077925743535161,-0.00020983308786526322,0.002263167407363653,0.001448919763788581,-0.002695369767025113 3,0.00226890342310071,0.0004349398659542203,0.002644110703840852,0.001253339578397572,8.279987923742738e-06,-0.001131041324697435,-0.00016871494881343096,0.0037425034679472446,-0.0004944991669617593,0.002063847379758954,0.0016815878916531801,0.001273772562853992,0.0002401862875558436,0.0005198025610297918,-0.002226941753178835,-0.0036225663498044014,-0.001742878113873303,-0.0021801230031996965,-0.0013626416912302375,0.0028272985946387053,-0.00212662061676383,0.000749408733099699,0.002696521580219269,-0.0001593382767168805,0.003844756865873933,-0.0003320556425023824,0.0022467942908406258,-0.0004261289141140878,0.00248247804120183,-0.0006328768795356154,0.0022790112998336554,0.0008149463101290166,-0.0001626605517230928,-0.002950516762211919,0.002453156979754567,-0.00021675876632798463,0.00034372464870102704,0.0004411549598444253,0.0023702923208475113,0.0001581401302246377,-7.819894562999252e-06,-0.003531169844791293,-0.003829261288046837,-0.0020093373022973537,-0.001471416442655027,-0.000981805962510407,-0.003142383648082614,0.0007026693201623857,-0.002985609695315361,0.0016819903394207358,-0.0034136781468987465,0.0023745307698845863,0.003282897174358368,0.0034192968159914017,0.0018797536613419652,0.0031688385643064976,0.003438518615439534,-0.0013528221752494574,-0.0014454845804721117,-0.0003827727632597089,-0.0033089127391576767,-0.0007924533565528691,-0.0027770015876740217,-0.0031972306314855814,-0.0018739199731498957,0.0017090975306928158,0.001514140865765512,0.0014564525336027145,-0.0010264359880238771,0.0008885915740393102,0.0016617580549791455,-0.0020448844879865646,0.001694984151981771,0.003790121991187334,-2.4743640096858144e-05,6.94061818649061e-05,-0.0027131030801683664,0.0032475837506353855,-0.0034968117251992226,-0.00022669867030344903,-0.0009578681201674044,0.003755686804652214,0.0009502334869466722,-0.001079508918337524,0.0006866870098747313,-0.0010925616370514035,0.0007450280827470124,0.0007697222172282636,0.0005540053825825453,-0.0027229939587414265,-0.0013332952512428164,0.003556907642632723,0.0016613180050626397,-0.002592679113149643,-0.0011174503015354276,-0.0005371844163164496,-0.0006635829340666533,0.0006453921087086201,-8.852890459820628e-05,-0.0033397218212485313,0.0005219417507760227,-0.001324216602370143,0.0024543653707951307,-0.0019174609333276749,-3.89934066333808e-05,0.0010165907442569733,-0.003580202581360936,0.0033339345827698708,0.0022478068713098764,-0.0020412227604538202,-0.000650318805128336,0.0029213305097073317,0.002677810611203313,-0.002436918905004859,-0.0012986373621970415,-0.00012114057608414441,0.0016735560493543744,0.0024930897634476423,-0.0013271743664517999,0.0001399534085066989,8.283466013381258e-05,0.003352649509906769,-0.0014204017352312803,0.002870027907192707,0.0021030264906585217,0.0011896740179508924,-0.003351186867803335,-0.0006756282527931035 4,0.0019581944216042757,-0.0022107346449047327,-0.0011561198625713587,-0.002886272268369794,0.001594733214005828,-0.001594776171259582,0.0026919764932245016,-0.00012680278450716287,-0.0005581804434768856,-0.0035449275746941566,0.0037003939505666494,0.0012354212813079357,-0.0009580010082572699,0.00014428250142373145,-0.0014593577943742275,-0.002729069208726287,0.0005723288049921393,0.0035142232663929462,0.003921943251043558,-0.003527914872393012,0.0019456726731732488,0.0022559280041605234,-0.0005423698457889259,0.001034521614201367,-0.0013501851353794336,0.0027604158967733383,-0.002807273529469967,0.00024234205193351954,-0.0016576320631429553,-0.0012966239592060447,-0.002213762840256095,-0.003365762997418642,-0.0006228607380762696,-0.002895353827625513,-0.0035488856956362724,0.0013820005115121603,-0.0009534824639558792,0.0018478281563147902,0.0034116727765649557,-0.0006756664952263236,0.0024415014777332544,-0.0029353690333664417,-0.0025126056279987097,-0.0009988313540816307,0.0036377261858433485,0.002995045855641365,-0.0019054751610383391,0.0029589999467134476,0.0008932192577049136,-0.0013001038460060954,-0.0020715310238301754,-0.00020094586943741888,0.002570840297266841,0.00380155723541975,-0.002473586704581976,0.0011213001562282443,0.0031946205999702215,9.909242362482473e-05,0.001311403582803905,0.002204875461757183,-0.0011220034211874008,0.0016122518572956324,0.002348880050703883,0.0035365682560950518,0.000909817055799067,0.002755098743364215,-0.003383677452802658,0.0009969278471544385,0.00048190669622272253,-0.0015868479385972023,0.0004751212545670569,-0.0006727377185598016,0.0024358814116567373,-0.0016325784381479025,7.368947990471497e-05,0.0021812072955071926,8.241814066423103e-05,-0.0024973764084279537,0.002266516676172614,0.0037054966669529676,0.0005226759240031242,0.00389161822386086,0.001365591655485332,-0.0012526890495792031,-0.0006436291732825339,0.0026691495440900326,0.0006279962253756821,-0.00046710699098184705,-0.003736028913408518,-0.0007733455277048051,-0.0025140242651104927,-0.0011113928630948067,0.000681806995999068,-0.0023756904993206263,0.0032400900963693857,0.0034952759742736816,4.563291804515757e-05,0.003599037416279316,-0.0007084793760441244,0.0026228672359138727,0.0018671357538551092,0.00361993839032948,-0.0032833567820489407,0.0009056946146301925,0.002505404641851783,0.0018368527526035905,0.0023507531732320786,-0.0016589193837717175,-0.0010653110221028328,-0.0034096611198037863,0.0037409814540296793,-0.0033911853097379208,-0.0016950403805822134,0.0009163717622868717,0.0020247497595846653,0.0028596229385584593,-0.00021874137746635824,-0.00133735709823668,0.0036610793322324753,0.0022860660683363676,-0.0016334510874003172,-0.003308979794383049,-0.003778632963076234,0.0006930023082531989,0.0002600970910862088,0.003300223732367158,-0.0016991931479424238,0.000940822996199131