Closed GoogleCodeExporter closed 9 years ago
Anyone knows how to make a judgement the data have errors or not? If two
universities have same numbers in whatever items, I would think it is an error.
Anything else?
Original comment by sdpa...@gmail.com
on 17 Feb 2012 at 8:54
I judged errors on the basis of the surrounding information. There were a few
numbers that seemed very low but again I am unsure weather it is a problem or a
real data point. Any recommendations?
EM
Original comment by libbymon...@gmail.com
on 17 Feb 2012 at 9:02
Attachments:
I'd say make a note of anything that looks suspicious in the metadata.
Just pointing out what *might* be wrong should help...
If you think you know what it *should* be, then change it.
Original comment by icos.atr...@gmail.com
on 18 Feb 2012 at 4:55
I'm unsure of how public vs. private, city demographic, etc will be assigned?
Same as zip codes?
Thanks!
Original comment by tracy.a....@gmail.com
on 18 Feb 2012 at 5:02
@Tracy -- it looks like Mary can get those out of a public database, so we
don't need to enter it or error-check it (see Mary's "NCES" issue). This issue
is just for error-checking columns.
Original comment by icos.atr...@gmail.com
on 18 Feb 2012 at 5:08
Does any one have any standards to check those errors? I worry that I change
the data which are good.
Also, should I fix the errors on the originally data sheet and highlight it or
I input the new data in the matadata sheet.
Original comment by sdpa...@gmail.com
on 18 Feb 2012 at 4:25
If you think you know what it *should* be, change it in the original data
sheet, and note the change in the metadata sheet.
If it looks suspicious, then make a note in the metadatasheet.
Don't highlight anything. These won't necessarily be processed in Excel.
Original comment by icos.atr...@gmail.com
on 19 Feb 2012 at 3:21
I've got some huge variation in my data, but I'm not sure if these are errors.
This includes many not reported gaps. What should I do?
Original comment by SKMcCorm...@gmail.com
on 20 Feb 2012 at 11:45
I'm not sure where to make a change in the metadata, so I hope this works for
everyone.
I didn't notice any obvious errors in N (full-fac), O (part-fac), P
(full-staff), Q (part-staff). I tried to check current reports on the
CollegeStats.org and most seemed to be within a reasonable range of the current
stats reported. There were several blanks, which I left empty. Some school
only reported either full time faculty and staff or only full time faculty.
Only an observation I wanted to note, not sure if it is a real concern or not.
Original comment by tracy.a....@gmail.com
on 21 Feb 2012 at 4:14
I don't seen any obvious errors for my data either (R, S, T, V, Y).
Xiaoben
Original comment by sdpa...@gmail.com
on 21 Feb 2012 at 4:16
I did find one potential error, AE164 seems to be incorrect. Other than that
everything looks ok.
~Jason
Original comment by jayco...@gmail.com
on 21 Feb 2012 at 5:49
Attachments:
I found quite a few errors in duplicate reporting, incorrect years, etc. So I
modified the data file and listed all of my changes in the metafile. Hope the
format I used in the metafile works, if not, let me know I and I can change it.
Original comment by drcolm...@gmail.com
on 21 Feb 2012 at 5:53
Attachments:
@Kevin My understanding is that for alot of the values you're looking at, some
schools may not be using those particular types of energy source, or they may
just not report them and/or don't have data for them. I think it'd be best to
treat them as either not reported or no data available when no values are
given. I'm working on writing up a summary of what goes into each piece of the
data, how it's calculated, etc. so some of this is more clear to the analysis
committee - should be done with it in the next couple days.
Original comment by drcolm...@gmail.com
on 21 Feb 2012 at 5:58
Hey guys, these are the changes I made:
I added everything to the two files Dan sent.
-Adeline
Original comment by adelinem...@gmail.com
on 21 Feb 2012 at 7:18
Attachments:
Here are my corrections. I found a few errors, but for the majority of the data
the only problem may be the zero scores, which for the larger colleges may
indicate a non-reporting year.
Original comment by SKMcCorm...@gmail.com
on 21 Feb 2012 at 7:52
Attachments:
Sorry I haven't finished these yet. Not forgotten. Is someone planning on
merging these together?
Original comment by icos.atr...@gmail.com
on 23 Feb 2012 at 11:43
My checked data is attached. I did not change anything. The problems are listed
in the metadata next to the appropriate line number. The problems I saw were
decimals in student enrollment, same enrollment listed for several years, and
for a few institutions the breakdown of enrollment exceeded the total
enrollment. I can check some of these things with our other data sources. If I
change anything, I will submit an updated worksheet.
Mary
Original comment by marymai...@gmail.com
on 23 Feb 2012 at 3:28
Attachments:
I can compile it!
-Adeline
Original comment by adelinem...@gmail.com
on 23 Feb 2012 at 5:58
So here's the merged data. I've got everyone's corrections in, except for
Christian's. I suggest everyone take a look at the META data file. I've
highlighted a few suspicious entries that should be fixed, or at least looked
over.
I changed all the location references so that they match the last file that Dan
sent. But when that was too much of a pain, I indicated the school+year as
reference to the main file. I also included relevant comments from this thread.
-Adeline
Original comment by adelinem...@gmail.com
on 27 Feb 2012 at 4:54
Attachments:
For my two highlighted comments I'm not sure what to do. I very much doubt that
all zeros reported actually mean a zero measurement. For many of these I
believe that they represent non-reported scores, ans as such may create an
irregular distribution.
Original comment by SKMcCorm...@gmail.com
on 27 Feb 2012 at 6:59
Maybe we should fix zero as N/A. I don't think it will distort the
whole picture and statistical software will ingore it. Or ,we can let
the data group decide.
Original comment by sdpa...@gmail.com
on 27 Feb 2012 at 4:14
Yeah I agree with Xiaoben.
There are still 2 double entries (last two highlighted items). Do you want to
fix those, Dan? Since you deleted all the other doubles, I feel it would be
more consistent for you to judge which to keep. Also, you should probably leave
the deleted entries as blank rows so that it doesn't shift all the reference
locations in the meta data.
And as for the negative values...should we assume they are supposed to be
positive, or discard them altogether?
The remaining highlighted items deal with outrageously high numbers. We should
probably do a little background check to make sure they're acceptable.
-Adeline
Original comment by adelinem...@gmail.com
on 27 Feb 2012 at 4:36
Ya I'll fix the double entry and make sure not to delete the rows (sorry - lack
of forward thinking on that one). Was there alot of negative values for
different variables? If it's a small portion, we can do some spot checking on
them to see if they're clearly supposed to be positive or if it's too ambiguous
to call and just discard them.
Original comment by drcolm...@gmail.com
on 28 Feb 2012 at 4:31
Attached are my checking. Just a few edits and some comments.
Original comment by icos.atr...@gmail.com
on 2 Mar 2012 at 10:01
Attachments:
So I finally have everyone's corrections. I've updated the meta data file,
highlighting new issues that have arisen. I also un-highlighted problems that
we've solved, specifically those pertaining to the data columns we decided not
to keep. So again, I recommend you guys check out the meta data file to see if
you can make any corrections.
-Adeline
Original comment by adelinem...@gmail.com
on 2 Mar 2012 at 6:10
Attachments:
The finished file is here:
http://unm-macroecology-2012.googlecode.com/files/ACUPCC-clean-finished-allcols.
csv
Final metadata, including list of changes and suspect entries, is here:
http://unm-macroecology-2012.googlecode.com/files/FIN_acupcc_Meta.csv
Original comment by icos.atr...@gmail.com
on 6 Mar 2012 at 6:03
Original issue reported on code.google.com by
adelinem...@gmail.com
on 17 Feb 2012 at 2:03