Revision - Transfer some current AE data elements to the Outlet file

IMLS / public-libraries-survey

FY 2026 IMLS Public Libraries Survey: Solicitation of Data Elements Changes

6 stars 3 forks source link

Revision - Transfer some current AE data elements to the Outlet file #68

Open enielsen-air opened 1 year ago

enielsen-air commented 1 year ago

Name: Evan Nielsen

State/Affiliation: American Institutes for Research (AIR)

Description of Change: Transfer some or all of the following data elements from the AE file to the Outlet file (or at least assess the feasibility of transferring):

501 Library Visits (VISITS)
650 Number of Internet Computers Used by General Public (GPTERMS)
651 Number of Uses (Sessions) of Public Internet Computers Per Year (PITUSR)
652 Wireless Sessions (WIFISESS)
Also explore feasibility of transferring physical holdings and physical circulation

Justification: With the existing PLS data schema, it is difficult to analyze library services and use at a local level within multi-outlet systems, which are predominantly in urban and suburban areas. Transferring existing PLS data elements from the AE to Outlet levels would address this problem and allow data users, especially those interested in urban and suburban areas, to have access to more location-specific data. These data element transfers would only affect multi-outlet systems, so the 80% of AEs in the PLS universe that are single-outlet would have no change to their reporting burden.

Potential Methodological Issues: The current reporting method indicators for #501, #651, and #652 would also need to be moved to the outlet level. Furthermore, new imputation methods for the outlet file would be needed to fill in missing data, as most current methods only work with the LSA population value of the AE to create imputation strata.

States Already Collecting: Georgia already collects all PLS data elements at the outlet level.

mgolrick commented 1 year ago

The burden on my libraries for this change would be astronomical. There are (now) 68 AEs, but I have about 340 outlets. The data collection burden on each of the systems would be an increase in orders of magnitude. States where the number of AEs and outlets are close to the same would see no burden, states where the services are provided by many systems with multiple outlets would see a huge increase in burden. I am unalterably opposed to data collection at this level. My libraries would likely revolt.

megdepriest commented 1 year ago

California: 186 AEs, 1130 Outlets. Increases reporting burden significantly. 63% of CA libraries have more than one outlet. 27 AEs have more than 10 (664 outlets for those 27 AEs). That's a LOT of extra reporting.

enielsen-air commented 1 year ago

@mgolrick and @megdepriest , thank you for your feedback! I should have acknowledged that while 80% of all AEs are single-outlet, that proportion varies significantly from state to state.

That said, I have to push back a little: Am I wrong that these data elements are counted at the outlet level already? For example, how would one determine the number of visits for an entire AE without collecting the counts/estimates from each individual location and summing them. And @mgolrick , one of your catch phrases is, "I never let them do math." So why are visits or any of these other data elements an exception?

I think that even if we could just move the number of visits to the outlet level, that, along with HOURS and SQ_FEET could allow the ability to model the level of activity at individual library outlets, rather than only at the service area (jurisdiction) level.

megdepriest commented 1 year ago

I hear you, Evan! The burden lies more in the transition, I think. But manageable.

rfuquaSLO commented 1 year ago

This may be one of those rare instances where I find myself agreeing with @mgolrick 110%.

I would have a very difficult time being persuaded that the quality of data gathered through this potential change would be worth the increased burden. Also, I find this proposal a bit surprising given recent discussions about trying to better manage libraries' reporting burden in the context of the PLS.

I definitely get the general burden argument @enielsen-air has posed here -- 80% of Oregon libraries are single-outlet AEs, so NBD for them. However, the other 20% would likely show up at my door with torches and pitchforks if we tried to implement these changes.

Our rationale: - #501 Library Visits (VISITS) - these are extremely messy figures. Last year (FY22) over a quarter of Oregon libraries either reported their counting method as an estimate, or reported that they do not track visits at all. Many of our locations and branches do not have automated gate counters, and those that do are constantly lamenting accuracy issues with them. It's messy data nationally, and I don't see outlet-level data improving that. - #650 Number of Internet Computers Used by General Public (GPTERMS) - While I feel this is a critical thing for libraries to offer, I'm not sure what having outlet-level granularity of this data will help illustrate. It seems like there is pretty good evidence that libraries have reached "Peak-GPTERMS," given the pandemic, the cost of real estate, and the transition to a reality where more and more patrons bring their own devices. - #651 Number of Uses (Sessions) of Public Internet Computers Per Year (PITUSR) - see above comment; a quarter of our libraries do not track this stat at all, or only provide estimates. - #652 Wireless Sessions (WIFISESS) - again, VERY messy data here. Over 60% -- yes, you are ready that correctly, over 60% -- of Oregon libraries last year reported this number as either an estimate or did not track wi-fi sessions at all. - Also explore feasibility of transferring physical holdings and physical circulation - Many of our multi-outlet libraries have floating collections (i.e., collections that are not necessarily assigned a physical location/branch). While I'm sure most of our multi-outlet libraries are tracking physical circulation by location, we would really need to hammer out some solid guidance for how to handle both item counts and circulation for floating collections.

Apologies for the diatribe here, but this proposal perfectly illustrates some growing concerns about the PLS we have in my state regarding data quality and scope. How much more detail and/or granularity can we continue to reasonably achieve through data that's mostly self-reported (mostly on a volunteer basis!) by over 9,000 AEs annually?

enielsen-air commented 1 year ago

Really appreciate your input, @rfuquaSLO. I want to reiterate that my proposal isn't definitively to move all of these data elements to the outlet file, it is just to explore which ones could be--so I listed the most likely candidates to start the exploration. As I said in my earlier reply, I think even having just visits moved to the outlet file would go a long way to understanding library use in a local context.

I really don't want to wear out my welcome in this wonderful community, but no one has yet explained why the burden would be so much higher to report the parts that they already have to collect to be able to report the sum that they are already reporting.

Furthermore, the fact that the visits figures are messy is a reason to capture them at the outlet level, in my opinion. If outlets are currently reporting estimates, they can continue to report estimates. But if AE figures are a combination of actual counts and estimates, wouldn't it be better to capture that than to have the sum as an estimate (which is the current guidance for the reporting methods questions)?

IMLS is currently exploring analytic methods that could enable measuring how libraries are related to strong communities. But those efforts are hindered by the lack of granular data for multi-outlet AEs.

KristenCooke commented 1 year ago

I wanted to chime in here. We are _very _granular with our survey.

Many of the libraries in our state may have only joined a consortium or system on paper while continuing to function very independently. Consortia in many cases in Arkansas are pro forma in order to qualify for state aid or formalize reciprocal borrowing. That creates a need for us to be more granular with our survey in order to contextualize the function and impact of our libraries better. For us, many of the questions that are asked of the AE on the PLS are also asked at the outlet level and auto-calculated to get a system total which is reported to IMLS.

I have a complicated relationship with our penchant for outlet-level questions mostly dependent upon what I am doing with the data at the moment. If I trying to understand a library's function within the community, it helps tremendously more than just AE-level data would. However, it does make the collection and submission to the portal a bit more complicated.

The most frustrating point is that any question asked at the outlet level and reported as -1 usually can require you to report the AE total as -1 as well. So, depending upon the edit checks, if one library in a system drops the ball, it could impact your ability to report a system total. If the problem is widespread, you can end up reporting less reliable data than when you were only reporting the AE total.

If we went this way, I think adding a reporting method question would be helpful (Estimate, Actual Total)

When I weigh the work it takes to ask granularly versus the benefit we get from it, I have to say I find it worth it. It has helped tremendously with capital campaigns, grant applications, and the like. However, we are a smaller state so I can't speak to scaling in larger states.

Kristen Cooke - Arkansas

angelakfox commented 1 year ago

Surprising no one, I would be extremely reluctant to see most of the elements considered here reported at the outlet level, largely for reasons that my colleagues have explained so well: increased burden on the larger systems; the possibility of making the overall data less reliable due to non-reporting or increased estimates; having to -1 entire systems for one missing branch, etc. But I think there's also something of an issue just with the optics of it all.

It speaks to Evan's point about how libraries must be collecting the data at the branch level in order to report the totals. Yes, libraries do need to collect that data somehow - but reporting them is something different. Reporting them is having that many more blank fields on the survey and needing to scroll down that much further to get to the bottom of the survey.

Reporting them is having the state tell you, “Yes, I know we've added so many questions in recent years. Yes, I know you’re extremely frustrated with all the new questions. Yes, I know you’re extremely frustrated with all the modifications to the new questions before you can even get used to them. Yes, we hear how burnt out you. Yes, I promise you we’re listening. And yes, we’re still going to add even more questions – questions that seem to you to be completely unnecessary and without benefit.”

It feels somewhat tone deaf.

Our libraries are struggling. They’re continually asked to take on more work with fewer resources. Their libraries have become cultural/political battlegrounds and many find their boards are being weaponized against them. Adding a few more questions to the PLS (and telling them look, you're already collecting the numbers!) may seem like a small thing, but for some libraries, it's just an indication that the state isn't listening or doesn't care.

enielsen-air commented 1 year ago

Thank you for sharing that perspective, @angelakfox. The last few years have been a lot for everyone, but essential community service providers like public libraries have borne the brunt and, just when the pandemic recovery is in sight, the polarization conflicts ratchet up.

Yet the PLS historically has been a moving target; the most stable the survey has been in the last 15 years was the period right before the pandemic from FYs 2017-2019--only one change over the three years--during a veritable sea-change in the role of public libraries in their communities. So we're all just trying to make sure that the PLS can be as useful for the field in the next decade(s) as it has been for the last three and a half.

Since multiple commenters have now raised this issue, I feel the need to address the idea that collecting data at the outlet level would "decrease the reliability of the data" due to higher missing rates. My interpretation of this anxiety is that we'd rather maintain our plausible deniability of the current reliability problems in those data. Consider this: rather than an AE reporting a total visits figure that was missing one or more of their branch figures (and we don't know they are missing), if we instead collected visits at the outlet level and could impute the missing values, then the final dataset would likely reflect reality better than when we didn't know an entire branch's visits were left out of a reported AE total. I have to invoke @mgolrick again: "Don't let them do math!"

KristenCooke commented 1 year ago

@enielsen-air

Thanks for the comment about imputation. The crux of any concern/issue that arises for me was less the plausible deniability, but more indicative of my resignation toward some of the elements where I consoled myself that some data was better than no data. In the midst of my load-shedding, those seemed to be the only options. I remember now how the imputation process would resolve this area of concern.

mgolrick commented 1 year ago

I am concerned (among other things) about the burden. The initial increase in burden is on the individual libraries. And as someone whose average ratio of outlets to AEs is 5 and the range of outlets is from 1 to 22, there will be burden. Most of the burden will fall on the larger library systems who generally tend to have the capacity. However, having been on the other side of the fence, as a library director including multiple years in a system (AE) with 5 outlets, I can tell you that part of what would be required is a complete restructuring of the data gathering and reporting system.

The larger ongoing burden, however, is on the SDCs who have outlet to AE ratios which are large. This proposal has a huge increase on the amount of data with which we would be required to deal. Remember that each data element moved from the AE to outlet will, for me, mean that instead of 68 numbers to review, I will now have 340.

The larger issue that @enielsen-air raises is about whether we are counting the right things. He is not wrong. Are we counting the right things? I am not sure that dramatically increasing the level of granularity is as useful on a national scale. I would be supportive of adding, and actually for a few years collected, data on meeting room use. That is something that measures the value of the library as place in a community. We have not collected information on "self-directed activities" (aka "passive"). The pandemic showed me the fact that many people enjoy that aspect of the service that some provide. We are wrestling with counting the shift to digital. I think those changes are important and useful - especially as we look on a national, aggregated scale. I think that is part of the huge value of the PLS, the national-level aggregated data. Adding granularity simply makes our jobs harder.

We may not agree on some aspects of this, but I do think it is an important conversation to have.

rfuquaSLO commented 1 year ago

I'd just like to say a quick thanks to everyone for this discussion. I deeply appreciate (and want to echo) @angelakfox 's thoughtful comments about the realities our public libraries are facing right now, and want to reiterate that it doesn't feel right to keep asking more and more from them. Also thanks to @mgolrick for acknowledging the increased burden on us SDCs that a change like this proposal could have on us! Sorry if I am derailing the thread, @enielsen-air! This comment might reflect more of where I'm at personally with this work, but I wonder if we as a community are approaching a point of inflection with the PLS? Like, maybe there is a growing opportunity for us to have more discussions about the "big picture" of the PLS, what it is and where it should be going? I know a lot of us like to dive deep into the details with this stuff, but it might be a good exercise for us to stop and look at how we could improve the overall process for everyone involved (especially libraries). The PLS represents a rich legacy of data we've all contributed to, and I'm not suggesting we chuck it all out the window.... It just sounds like a lot of us are all concerned about the sustainability of the effort, yet those concerns are often expressed in very different ways. Maybe a session topic for December? Who knows!

enielsen-air commented 1 year ago

@rfuquaSLO, you are not derailing the thread at all! It makes complete sense to me that this proposal has led to higher order questions about the scope of the PLS. I think Marisa would be open to starting a larger discussion in December, and the timing is good for her and the Mentors as they plan the SDC meeting agenda this fall. (Don't ask where it is, because I don't know!).

It seems to me that changing the reporting structure of metrics that are already tracked is a completely achievable objective, but it will require thorough understanding of the current systems used by libraries and SDCs to track and collect them.

If we want to be able to demonstrate, quantitatively, the role of libraries in their communities--not just as a way to access the latest John Grisham or Danielle Steel book for free, but as one of those essential "third places" where community members interact and learn about themselves, each other, and anything else under the sun--then we need to have more spatially granular data than we currently have for a county-wide (or multi-county) library system with 4-6 outlets. Maybe, as @mgolrick says, that is not feasible to do at the national level, but that just leads me back to @rfuquaSLO's question: what is the purpose of the PLS?

@mgolrick, I've been waiting for someone to propose the meeting room concept, as the LSWG discussed it briefly at the summer meeting this year. However, I have to ask: would you envision data elements about meeting rooms on the AE file or outlet file? IMHO, they clearly belong on the outlet file (at least a presence/count of rooms data element should go on the outlet file as a facilities metric).

MFAroWI commented 1 year ago

WI has primarily Administrative Entities with only a single central library. For the few libraries which have branches a separate survey is set up on our collection tool. Only the libraries with branches must submit this second survey. The proposal to move elements to the outlet level would require significant work in the collection tool and, depending on implementation, may increase burden to libraries by requiring all AEs to submit the second survey. As mentioned previously, the libraries with shared collections would need well defined guidance on how to report such collections.

KathleenSullivan commented 1 year ago

This is a good topic to bring up and worth more discussion at the December meeting. Candidly, I couldn't support adding these as outlet questions now, because I don't find these particular elements reliable even at the current AE level. Until the collection is basically automated (= fixed counting method, less burdensome, much more reliable), I wouldn't want to make them even more granular.

In Washington, tracking at the outlet level has come up more as an equity issue -- do communities have adequate FTEs? programs? refreshed collections? technology? We're exploring non-PLS ways of better tracking this for now.

mgolrick commented 3 months ago

A couple of shower thoughts:

While Information may want to be free, data is not free. It costs money to collect it and report it.
While much of the data may be collected at the outlet level by the library, it is most often used for internal management.
For some libraries, there will be data which is not available at the outlet level, while other libraries do have that data.
There is an additional large burden on the SLAA in any state where the ration of outlets to AEs is large. This is not insignificant.
Just because it may be available on a local level, does not mean that it can be readily collected and aggregated at a larger level.

buzzySLO commented 2 months ago

I think there is a large methodological issue with this proposal that hasn’t yet been raised, namely that this data will be so out of context that it will not be useful for its intended purpose to “allow data users, especially those interested in urban and suburban areas, to have access to more location-specific data.”

The comparability of PLS data across libraries relies on the ability to benchmark based on core attributes – service population, budget, and staffing - that have the largest impact on the other data elements; a library’s visits, collection size, programming stats, etc. will vary highly depending on the size of its service population, budget, and staff. This proposal would collect data elements but lack information about those core attributes, which need to be controlled/analyzed to properly compare libraries. The proposed outlet-level data elements minus the core attributes are devoid of context.

The obvious answer to this problem would be to collect data about those core attributes, but that’s far easier said than done. Multi-outlet AEs understandably centralize a lot of their functions like technical services, administration, technology, etc., so it would be well-nigh impossible to isolate budget or staffing at the outlet level other than very cursory information, such as the FTE devoted solely to those locations.

Determining service population would be even more challenging. It is incredibly unlikely that library outlets’ service areas could be cleanly tied to a particular Census-measured geography like a city, Census-designated place, or Census block. In reality, especially in sub/urban areas, library branch service areas bleed over multiple Census blocks. They may serve all of a few Census blocks and only portions of other ones, based on the geography of the community and the layout of major road arteries where the outlet is located. In rural areas, branches may serve unincorporated parts of counties for which accurate Census data is sparse. And I don’t know about y’all, but in Oregon we really don’t want to spend a bunch of time attempting to ascertain service areas for individual outlets; it’s hard enough to do it for our library districts that don’t follow city or county boundaries.

So as proposed, these data elements lack context. That context is too logistically challenging to collect. How then could a researcher using the proposed data produce a statistically reliable analysis to compare branches of different AEs? Much like the proposal from @rfuquaSLO to eliminate the reference data elements, I feel like this proposal would produce a lot of data that implies a level of statistical validity that it just doesn’t have. @mgolrick is exactly right: AEs collect this data for internal management. That doesn’t mean that it’s comparable across AEs.

As a final note, I think it’s also important to mention that multi-branch AEs are not exclusively in sub/urban areas with good data collection systems. Eight of Oregon’s 19 multi-outlet AEs serve fewer than 50,000 people, generally dispersed countywide populations. One of them serves 8,100 people with four branches spread over 8,300 square miles – larger than three US states, DC, and all the territories. Another serves 70,000 people with 12 branches over 6,200 square miles. Adding these data elements will be an even larger burden on these small, multi-outlet rural AEs.

enielsen-air commented 2 months ago

Thank you for your thoughtful contribution to this discussion, @buzzySLO. I agree with you that it will be important to have a metric to normalize the outlet-level data elements. I also agree that it is unrealistic for that metric to be service area population, since determining service area populations for each outlet would be incredibly difficult.

That said, service area population is the only metric IMLS currently uses to normalize AE-level data. We do not use expenditures or staffing for that purpose, but rather we normalize those data elements with service area population. So I don't foresee using expenditures or staffing to normalize outlet-level data. But another way of looking at this is that the AE-level per capita metrics that IMLS publishes are also lacking much of the context that you are concerned about with the outlet-level data. As has always been the case with PLS data, there is a limit to what quantitative analysis can tell about how libraries are similar and different in how they serve their communities.

The outlet-level data elements under consideration are mostly measures of activity, so we can understand how usage patterns vary at the local level. The most basic usage measure is the count of visits. So I could see some value in normalizing other usage metrics by visits. However, I acknowledge that doesn't solve the problem of how to normalize visits at the outlet level.

@rfuquaSLO's arguments for deleting REFERENC were very convincing. But we also must uncomfortably acknowledge that while that data element may be the most egregious in terms of incomparability, it is by no means the only one. Circulation periods vary between libraries (not to mention material types), and autorenewal policies also vary widely. Computer sessions vary in length. And, perhaps most worryingly of all given its use for normalization, service area populations are determined with different data sources and vintages from state to state--as IMLS and AIR have attempted to document. So while we are all in the business of collecting the most consistent, accurate data that we can, we also have to acknowledge that our data are abstractions and perfect representation of reality is unattainable.

You are also correct that multi-outlet libraries are not exclusively in urban and suburban areas. In fact, nationally, there are similar numbers of multi-outlet AEs classified in each of the four locale categories (city, suburb, town, rural): about 400-500 in each category. But multi-outlet AEs account for 67% of all city AEs, and only 10% of all rural AEs (suburb and town are both about 20%). And among outlets of city AEs, 95% are part of multi-outlet AEs, whereas 33% of rural AE outlets are part of multi-outlet AEs. My point is that we have to acknowledge the current data collection protocol results in a dearth of granular information about libraries in urban areas that is available for most rural libraries. Maybe this is fair for peer comparison purposes since larger, urban AEs often have more resources to use internal data for benchmarking, but it is decidedly unfair for anyone trying to use PLS data for any other research or policymaking purpose.

And I also agree with you that we absolutely need to take burden into account in this process, for the libraries and the SDCs. This is precisely the reason we are taking as long as we are to deliberate these changes and to collect additional input via the respondent research questionnaire that we are just wrapping up.

buzzySLO commented 2 months ago

Several thoughts/questions in response to your comment, @enielsen-air:

I understand that IMLS only uses service population as a way to normalize data. However, IMLS is not the sole user of this data. The libraries that provide this data also use it. As someone who managed rural public libraries for 15 years, I routinely used budget and staffing when determining what libraries might qualify as my peers; it simply didn’t make sense to compare the rural libraries I was at to wealthy suburban libraries that happened to serve a similar number of people. Budget and staffing are both important context too.
I agree that much of the PLS data also lacks similar context. That is not an argument in favor of approving this proposal. If anything, it’s more of an argument to critically look at existing data elements to consider adding more context to the data dictionary or another document on the limitations of those elements, determine whether they’re statistically rigorous enough to even continue collecting, or ideally both. Do the researchers and policymakers using PLS data even understand the fundamental weaknesses of many of the data elements?
It sounds like you agree that the outlet-level data proposed to be collected cannot be easily normalized. Given that, what kind of reliable conclusions could even be drawn from it that will help us understand local usage patterns? I will repeat again: it’s out-of-context data. You cannot make acceptably reliable conclusions if the data is poor and/or cannot be properly normalized.
I think you are missing my point about rural AEs. It makes sense that sub/urban libraries have a higher prevalence of multiple outlets, since they also tend to serve more people. My point is that collecting this data is a heavier burden for rural libraries, which tend to be under-resourced even when correcting for service population, and that needs to be considered. And much like the prevalence of multi-outlet AEs, some states have much higher proportions of rural libraries, such as Oregon.
I think I’m approaching this proposal using a different conception of fairness than you. PLS data is gathered nearly entirely by public employees, accounting for tens of thousands of hours. The vast majority of that time is paid using local, not state or federal, taxes. For that reason, it’s important that we can clearly articulate the value of any new data that we’re proposing to collect and how it’s going to benefit the public libraries upon whose labor we rely. Admittedly I have not been involved in all of the conversations about this proposal. However, I have yet to see a clear explanation of the kinds of questions that might be answered using this data and how the answers to those questions are going to benefit libraries. Are there clear research questions or policy proposals that could be supported by this data, keeping in mind that it will lack significant context and thus have limited statistical value? If so, how will that research or those policy proposals benefit public libraries?
With all due respect, I question that the burden of this proposal is properly being taken into account. Several SDCs here have made compelling arguments about the significant burden this will add for AEs and SDCs. At least in this conversation, those arguments have been uniformly brushed aside: AEs “already collect this data,” the nature of the burden hasn’t been properly “explained”, it’s fine if there are questions about the “reliability of the data,” etc. As someone who supervises one of those SDCs, I am concerned that their professional opinions, built from years of experience working directly with AEs and collating this data on behalf of their states, aren’t being respected.

It's clear that we’re all trying to benefit libraries in the ways we think are best. However, this discussion is fundamentally a cost-benefit analysis: does the value of collecting this data outweigh the costs it imposes? Several very smart SDCs here are saying that analysis decidedly comes out as negative. If all the concerns for this proposal were being properly weighed, it would not still be in consideration to add to the FY 2026 PLS.

Has it been considered to collect this data via PLA Benchmark, rather than through the PLS? That way, researchers could get what they want (the ability to compare multi-branch sub/urban libraries), libraries could choose whether to participate based on whether they find value in providing the data and having a larger dataset, a potentially wider range of questions could be asked since it would be voluntary, and the burden wouldn’t fall on SDCs to administer.

JHoganHI commented 2 months ago

Offering my perspective as a single AA, 51 outlet, 2 bookmobile statewide public library system.

How things work in HI: My primary job duties include running reports and compiling statistics for every outlet. Data collection runs the gamut from monthly circulation reports, number of service hours per year, programming statistics, database usage, eletronic resources usage, etc. etc. This data gets passed on to staff in the field, state agencies and entered into the PLS every year. Compiling data for my colleagues and stakeholders is something I enjoy and am highly committed to; however, it leaves me less time for data analysis, looking at the big picture, identifying trends, predictions, etc. I'm learning ways to speed up my daily processes by testing out new and different tools to streamline things. Some states have research departments that assist them with data analysis and that works well for them. I think we can learn a lot from them too.

I think it's worth asking with each data element we consider adding to the outlet level - who in your state is going to carry out this work? What else do they have already on their plate? Do you need to build staff capacity before expanding your data collection to the outlet level? Is it realistic for some places to manage, or will it result in more -1s?

Final things to ponder: how far will data collection tools advance in the next 5 -10 years? Maybe more sophisticated collection tools can level the playing field? Maybe it makes it worse if they come with a high cost and equity of access becomes a friction point. More granular data for some elements could certainly be good, but perhaps the question is when to pursue it?

I also really liked Ross's and others' suggestion to continue the discussion when we meet up. I'll also add the disclaimer that I've only ever worked in HI as a data coordinator, so I can't speak to what others see in other places and I'm still learning how our jobs integrate with each other at the national level through the PLS and IMLS's larger goals.

Just my 2 unicorn cents - Jessica

enielsen-air commented 2 months ago

@buzzySLO, I appreciate and respect that you are trying to guard the time and energy of the public employees you supervise and those you serve via library development. Perhaps it's not clear from the available context on this GitHub site, but this thread represents a very small proportion of the conversation that SDCs and others have had on this proposal. There are many SDCs who see the potential value of this proposal (and some who publicly support it), just as there are many who question the value (and some who oppose it). I strongly object to your characterization that anyone has "uniformly brushed aside" valid concerns from SDCs. There is enough interest in this proposal to continue exploration, which is what we have been and are still doing, but any decisions to implement these changes are far from made. The burden on SDCs will depend a lot on their data collection protocol: how easy or difficult is it to modify their data collection instrument. For a state like Oregon, which maintains its own data collection instrument, I fully admit that the burden will be higher than in some other states.

As PLA's Benchmark has evolved into a survey program that complements rather than duplicates the PLS, they no longer collect quantitative service measures. And even if they did, because their datasets are not publicly available, I'm not sure whether other researchers would be able to use the data. Nevertheless, one of the PLA's program managers sits on the LSWG, so I can certainly ask whether they have considered asking outlet-level questions.

Yes, we must attempt a cost-benefit analysis, but it is qualitative in nature--as it would be impossible to assign quantitative values to all of the associated costs and benefits--and reasonable people can disagree about the relative importance of each cost and benefit. This is one of the reasons why the Federal-State Cooperative System works the way it does: a supermajority of states must approve any items added or revised on the PLS.

Regarding research questions that could be answered with outlet-level data: Lisa Frehill (formerly of IMLS) and Melissa Cidade (formerly of the U.S. Census Bureau) tried to look at the relationship between public libraries and collective efficacy (i.e., a combination of social capital and civic participation). However, the Census Bureau data they had was a the block group level for urban areas, and they had to merge PLS data on visits, circulation, and program attendance at the AE level when it would have made a stronger analysis to have this data at the outlet level instead of using constant measures for every outlet in the AE. Yes, the outlet level data would need to be normalized, but just as you contextualize PLS data to meet your peer comparison needs, we can allow researchers to normalize outlet-level data in a way that makes sense for their research aims.

For other examples, it could suffice to normalize children's circulation as a proportion of total circulation (or children's programs as a proportion of total programs) as an indicator of how much a given library focuses their services on children. [Programming isn't on the table currently for moving to the outlet-level (and may never be) but this was an example that was top of my mind.] Metrics like number of computer sessions per computer also wouldn't require any per capita normalization, and could show that libraries in areas with low home internet access provide an essential service that is utilized more than in areas with higher home internet use--which would be relevant for both urban and rural areas.

buzzySLO commented 2 months ago

@enielsen-air

I just want to clarify, are you saying that when people raised concerns about this proposal in other forums, they were not met with responses similar to what was exhibited here, questioning whether they were properly judging their own potential burden and the burden on their AEs?
As @mgolrick and @megdepriest indicated in the first two comments on this proposal, the primary increase in burden comes not from a particular state's data collection work flow or software but rather from the relative proportion of AEs to outlets in that state. This proposal would entail collecting more data from more outlets. Actually, we're in far better shape in Oregon, as less than 15% of our AEs are multi-outlet. The burden of this falls quite heavily on states such as LA and CA. And honestly, I'm having a hard time not reading your comment that "the burden on SDCs will depend a lot on their data collection protocol" as further brushing aside legitimate criticism in the vein of earlier comments in this discussion. SDCs are saying this will be a burden on them and the AEs they work with. Please believe them and don't try to minimize what they're saying.
It's unfortunate that PLA Benchmark is getting away from collecting quantitative service measures. Perhaps a partnership with Urban Libraries Council would be more fruitful then, especially given the nature of the questions it sounds like researchers want to answer with this additional data? However, the fact that nobody else is collecting this data does not obligate us to do so. I really appreciated @JHoganHI's most recent comment about analyzing who already collects this data, what do they already have on their plate, and do they need added capacity. And I'll make a hopefully friendly addition to that list of asking how will they benefit by having this data collected at the national level.
The examples you gave on Frehill & Cidade's research and connecting computer sessions per workstation to areas with low home Internet uptake both rely upon establishing a connection between library outlets and some kind of geographic service area, something that I indicated in my first comment is far easier said that done. Does that mean we shouldn't try? No. But I think that it speaks to @mgolrick's point that the libraries who already collect this data do so for internal management purposes.
I think your example regarding children's material circulation also speaks to the utility of this data primarily for internal AE purposes: yes, analyzing children's circulation in a given branch can give a library system an idea of where it may want to focus children's programming. Does it follow therefore that we should collect outlet-level data on children's circulation nationally? No. The library is making a local decision; it doesn't need national-level data on other AE's children's circulation to make those decisions on how best to serve its very specific community. That speaks to a larger issue I see with this proposal: absent further articulation of the direct benefit to libraries, this proposal primarily benefits researchers. Yes, researchers are an audience for the PLS. However, the burden of data collection doesn't fall on the researchers, it falls on the AEs and the SDCs. That's why the bar should be high, the aforementioned supermajority, for adding new data elements.

Perhaps the SDC group and LSWG have already discussed this possibility, but I am increasingly wondering if certain PLS data elements should just be optional for states to report nationally. Absent a partnership with an entity like PLA or ULC, that may be a decent compromise, allowing states in the Federal-State Cooperative System who want to pursue this proposal to do so without inequitably burdening the states who do not.

KristenCooke commented 2 months ago

I don't have anything additional to add to the specific discussion of the metrics proposed here, but I did want to share that I was able to attend the annual meeting for APDU (Association of Public Data Users) and was able to hear several presentations from researchers in various fields that gave me insight into the way various fields of study use public data to publish insights or generate community change. My favorite story was from the New York Community Foundation that used several points of public data in conjunction with registration information from the Dolly Parton Imagination Library to convince landlords that it was in their best interest economically to eliminate lead in their rental properties when legislation had failed to do so.

APDU hosts various free webinars throughout the year. If you think it would be helpful to get some insight into how researchers use the types of data that we help to gather, you might enjoy listening in on one session. I can say that listening in to these presentations gave me a perspective I wouldn't have had otherwise. If it is comforting at all, APDU members also have hearty debates about the pros and cons of granularity. So, much like the rest of our lives, we aren't alone in this.

sadie-bruce-ok commented 2 months ago

Everyone has said such good things but I wanted to add what I heard from our system directors when I attended their bimonthly meeting. The main point they stressed was that this would be a huge burden. They did agree they can access the data but that turning data from one format to another takes time and this would be a lot of man hours. They also can't see the return on investment. I told them about being able to compare locations and they all, down to the last man, said they had better ways of comparison than the PLS. One director said she has contacts in every state and would just reach out to one of them if she needed to compare. In order to support this, they need to see why it would benefit them. I let them know other researchers use the data but they were uninterested.

I think so many people have brought up issues with this proposal but it's important to listen to the libraries who we rely on to give us the data. If our annual report wasn't attached to state aid, one director let me know if we implemented this they simply wouldn't do the report.