JuliaLinearAlgebra / MatrixDepot.jl

An Extensible Test Matrix Collection for Julia
http://matrixdepotjl.readthedocs.org/
Other
74 stars 22 forks source link

Matrices without values load as -1 #17

Closed jpfairbanks closed 9 years ago

jpfairbanks commented 9 years ago

If the mtx file has a blank 3rd column, matrixdepot loads the matrix with values -1. For example the Newman/polbooks matrix now edge weights (values column is empty) This works on Newman/football which provides the values column as ones(n). Is this a bug or the intended behavior? If intended, why?

julia> using MatrixDepot; a = matrixdepot("Newman/polbooks", :r)[1:10,1:10]
10x10 Array{Float64,2}:
  0.0  -1.0  -1.0  -1.0  -1.0  -1.0  -1.0   0.0   0.0   0.0
 -1.0   0.0   0.0  -1.0   0.0  -1.0  -1.0   0.0   0.0   0.0
 -1.0   0.0   0.0   0.0  -1.0  -1.0   0.0  -1.0   0.0   0.0
 -1.0  -1.0   0.0   0.0   0.0  -1.0   0.0   0.0  -1.0  -1.0
 -1.0   0.0  -1.0   0.0   0.0  -1.0  -1.0   0.0   0.0   0.0
 -1.0  -1.0  -1.0  -1.0  -1.0   0.0  -1.0  -1.0   0.0   0.0
 -1.0  -1.0   0.0   0.0  -1.0  -1.0   0.0  -1.0   0.0   0.0
  0.0   0.0  -1.0   0.0   0.0  -1.0  -1.0   0.0   0.0   0.0
  0.0   0.0   0.0  -1.0   0.0   0.0   0.0   0.0   0.0  -1.0
  0.0   0.0   0.0  -1.0   0.0   0.0   0.0   0.0  -1.0   0.0

Here are the files

> head -n 50 polbooks.mtx                                                    /h/j/./v/M/d/u/Newman (git)-[master] 
%%MatrixMarket matrix coordinate pattern symmetric
%-------------------------------------------------------------------------------
% UF Sparse Matrix Collection, Tim Davis
% http://www.cise.ufl.edu/research/sparse/matrices/Newman/polbooks
% name: Newman/polbooks
% [Books about US politics, compiled by V. Krebs]
% id: 2403
% date: 2001
% author: V. Krebs
% ed: M. Newman
% fields: name title A id date author kind notes aux ed
% aux: nodename nodevalue
% kind: undirected graph
%-------------------------------------------------------------------------------
% notes:
% Network collection from M. Newman                                         
% http://www-personal.umich.edu/~mejn/netdata/                              
%                                                                           
% Books about US politics                                                   
% Compiled by Valdis Krebs                                                  
%                                                                           
% Nodes represent books about US politics sold by the online bookseller     
% Amazon.com.  Edges represent frequent co-purchasing of books by the same  
% buyers, as indicated by the "customers who bought this book also bought   
% these other books" feature on Amazon.                                     
%                                                                           
% Nodes have been given values "l", "n", or "c" to indicate whether they are
% "liberal", "neutral", or "conservative".  These alignments were assigned  
% separately by Mark Newman based on a reading of the descriptions and      
% reviews of the books posted on Amazon.                                    
%                                                                           
% These data should be cited as V. Krebs, unpublished,                      
% http://www.orgnet.com/.                                                   
%-------------------------------------------------------------------------------
105 105 441
2 1
3 1
4 1
5 1
6 1
7 1
4 2
6 2
7 2
5 3
6 3
8 3
6 4
9 4
10 4
> head -n 50 football.mtx                                                    /h/j/./v/M/d/u/Newman (git)-[master] 
%%MatrixMarket matrix coordinate integer symmetric
%-------------------------------------------------------------------------------
% UF Sparse Matrix Collection, Tim Davis
% http://www.cise.ufl.edu/research/sparse/matrices/Newman/football
% name: Newman/football
% [American football games between Div IA colleges, Fall 2000]
% id: 2397
% date: 2002
% author: M. Girvan, M. Newman
% ed: M. Newman
% fields: name title A id date author kind notes aux ed
% aux: nodename nodevalue
% kind: undirected multigraph
%-------------------------------------------------------------------------------
% notes:
% Network collection from M. Newman                                        
% http://www-personal.umich.edu/~mejn/netdata/                             
%                                                                          
% The graph football contains the network of American football games       
% between Division IA colleges during regular season Fall 2000, as compiled
% by M. Girvan and M. Newman.  The nodes have values that indicate to which
% conferences they belong.  The values are as follows:                     
%                                                                          
%   0 = Atlantic Coast                                                     
%   1 = Big East                                                           
%   2 = Big Ten                                                            
%   3 = Big Twelve                                                         
%   4 = Conference USA                                                     
%   5 = Independents                                                       
%   6 = Mid-American                                                       
%   7 = Mountain West                                                      
%   8 = Pacific Ten                                                        
%   9 = Southeastern                                                       
%  10 = Sun Belt                                                           
%  11 = Western Athletic                                                   
%                                                                          
% If you make use of these data, please cite M. Girvan and M. E. J. Newman,
% Community structure in social and biological networks,                   
% Proc. Natl. Acad. Sci. USA 99, 7821-7826 (2002).                         
%-------------------------------------------------------------------------------
115 115 613
2 1 1
5 1 1
10 1 1
17 1 1
24 1 1
34 1 1
36 1 1
42 1 1
66 1 1
weijianzhg commented 9 years ago

Hi @jpfairbanks,

Thanks for the question. Matrix Depot use Base.SparseMatrix.CHOLMOD.Sparse to read data in Matrix Market format, which I think is a wrapper of CHOLMOD (http://fossies.org/linux/SuiteSparse/CHOLMOD/Check/cholmod_read.c).

For example,

julia> sparse(Base.SparseMatrix.CHOLMOD.Sparse("polbooks.mtx"))
105x105 Base.LinAlg.Symmetric{Float64,Base.SparseMatrix.SparseMatrixCSC{Float64,Int64}}:
  0.0  -1.0  -1.0  -1.0  -1.0  -1.0  …   0.0   0.0   0.0  0.0   0.0   0.0
 -1.0   0.0   0.0  -1.0   0.0  -1.0      0.0   0.0   0.0  0.0   0.0   0.0
 -1.0   0.0   0.0   0.0  -1.0  -1.0      0.0   0.0   0.0  0.0   0.0   0.0
 -1.0  -1.0   0.0   0.0   0.0  -1.0      0.0   0.0   0.0  0.0   0.0   0.0
 -1.0   0.0  -1.0   0.0   0.0  -1.0      0.0   0.0   0.0  0.0   0.0   0.0
 -1.0  -1.0  -1.0  -1.0  -1.0   0.0  …   0.0   0.0   0.0  0.0   0.0   0.0
 -1.0  -1.0   0.0   0.0  -1.0  -1.0      0.0   0.0   0.0  0.0   0.0   0.0
  0.0   0.0  -1.0   0.0   0.0  -1.0      0.0   0.0   0.0  0.0   0.0   0.0
  0.0   0.0   0.0  -1.0   0.0   0.0      0.0   0.0   0.0  0.0   0.0   0.0
  0.0   0.0   0.0  -1.0   0.0   0.0      0.0   0.0   0.0  0.0   0.0   0.0
  ⋮                             ⋮    ⋱         ⋮                         
  0.0   0.0   0.0   0.0   0.0   0.0      0.0  -1.0   0.0  0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0   0.0   0.0  0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0  -1.0   0.0  0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0  -1.0   0.0  0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0  …  -1.0   0.0  -1.0  0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0  -1.0   0.0  0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0   0.0   0.0  0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0   0.0   0.0  0.0   0.0  -1.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0   0.0   0.0  0.0  -1.0   0.0

But unfortunately I didn't find any explanations in the comments. Please let me know if you figure it out yourself.

Best wishes,

Weijian

jpfairbanks commented 9 years ago

These comments from your linked file look relevant:

  112  *
  113  * If Common->prefer_binary is set to its default value of FALSE, then
  114  * for symmetric pattern-only matrices, the kth diagonal (if present) is set to
  115  * one plus the degree of the row/column k, and the off-diagonal entries are set
  116  * to -1.  A symmetric pattern-only matrix with a zero-free diagonal is thus
  117  * converted into a symmetric positive definite matrix.  All entries are set to
  118  * one for an unsymmetric pattern-only matrix.  This differs from the
  119  * Matrix Market format (A = mmread ('file') returns a binary pattern for A for
  120  * symmetric pattern-only matrices).  If Common->prefer_binary is TRUE, then
  121  * this function returns a binary matrix (just like mmread('file')).

I think it is trying to make a graph Laplacian but entry [i,i] is not present in the file so it drops the degrees and keeps the -1's. I can see both behaviors being useful but the principle of least surprise suggests setting Commpn->prefer_binary=true although I don't know what other consequences that will have.

weijianzhg commented 9 years ago

Great, thanks!