Hippocampome-Org / php

Hippocampome web portal
3 stars 2 forks source link

GAA: [neurites] There is a discrepancy for the number of contacts between the matrix and the tool #518

Closed drdiek closed 4 years ago

drdiek commented 4 years ago

From Giorgio: "Just noticed this during my call with Nate: for DG GC to DG AAC, number of contacts is 2.33 according to the table, but 2.42 according to the tool using the ‘chosen’ parameters (1.09, 6.2, 2). How do we reconcile?"

nmsutton commented 4 years ago

@drdiek here is the one for CA1 with the missing 219 connections it appears: tool_ca1_3_distinct.xlsx

nmsutton commented 4 years ago

@drdiek here is CA3: tool_ca3_distinct_3.xlsx

nmsutton commented 4 years ago

@drdiek I ran it with CA2, DG, and SUB and didn't get any more connections than we already had with those so unless you find there are any missing from those, since they all check out, those subregions are finished at this point in that case.

drdiek commented 4 years ago

@nmsutton In terms of CA3, there appear to be some problems. Why are there only zero values for pre 2001 | CA3 Granule, 2008 | CA3 R-LM, and 2049 | CA3 QuadD-LM? Where are the entries for pre 2014 | CA3 Lucidum-Radiatum and 2021 | CA3 Interneuron Specific Oriens?

drdiek commented 4 years ago

@nmsutton There are also numerous cases for CA1 where there are only zero values.

drdiek commented 4 years ago

@nmsutton For EC, there are only zero values for pre 6019 | MEC LII Oblique Pyramidal.

drdiek commented 4 years ago

@nmsutton I am pasting in chunks of non-zero-valued CA3 entries as a check, and all of the values so far have wildly changed from earlier. Something is majorly wrong somewhere.

nmsutton commented 4 years ago

@drdiek thanks for checking this. I will take a look at what could have happened.

nmsutton commented 4 years ago

@drdiek also, I have now rerun csv2db with the new csv files you posted and uploaded that to phpdev. I have not recreated the json for the matrices based on that and will not add that unless you say it is needed. I will work on checking the new tool values now.

nmsutton commented 4 years ago

@drdiek I have now coded it differently and it appears to have fixed the issue. Here are CA1 tool_ca1_distinct_4.xlsx CA3 tool_ca3_distinct_4.xlsx EC tool_ec_distinct_5.xlsx. Apologies for not checking that more closely the last time.

drdiek commented 4 years ago

@nmsutton We still seem to be missing pre 2014 | CA3 Lucidum-Radiatum and 2021 | CA3 Interneuron Specific Oriens entries in CA3.

drdiek commented 4 years ago

@nmsutton For the most part CA1 is ok, but there are still 40 cases that I need to check into, which could take a while.

nmsutton commented 4 years ago

@drdiek Thanks for looking into this. "CA3 Lucidum-Radiatum" and "CA3 Interneuron Specific Oriens" are both on the exclude list. I do see both of those entries in the post column but not in the pre column. The are intentionally not in the pre column because of the exclude list. Do you see anywhere they are missing from the post column? I will check against Carolina's email today to get more info on this.

drdiek commented 4 years ago

@nmsutton EC has 88 discrepancies that I need to look into. I definitely have my work cut out for me.

nmsutton commented 4 years ago

@drdiek thanks for your hard work on this. We have been doing a good job and the end seems in sight. I just sent an email to you and Carolina about the exclude list. Another thing I want to look into is if Nikhol ever accounted for in the code the 7 neuron types that should have both pre and post excluded. He may have only excluded pre in those.

nmsutton commented 4 years ago

@drdiek here are the new updated CA3 tool_ca3_distinct_5.xlsx and EC tool_ec_distinct_6.xlsx. Those are the only ones that might have changed with the exclusion list updates. Of the files, CA3 increased 19 connections and EC had the same number of connections, therefore only CA3 might have ended up changing connections. I also included new code to exclude "reconstructions lacking" neuron types from post entries as well.

drdiek commented 4 years ago

@nmsutton The CA3 listing is still missing the 19 2014 | CA3 Lucidum-Radiatum (-)03300 entries.

nmsutton commented 4 years ago

@drdiek good catch. My mistake, I overlooked removing that from the pre list. Here is an updated version CA3 tool_ca3_distinct_6.xlsx. It shows 17 added entries. Perhaps 2 of the additional ones you mentioned were omitted because of pre or post types on the exclude list?

drdiek commented 4 years ago

@nmsutton Oops, my mistake. There were only 17 missing entries, so you should have them all now.

drdiek commented 4 years ago

@nmsutton I need you to please re-run CA1 for me. I just had to updated a number of values in the CA1-Table-1.csv file.

Good news is that I tightened the requirement from 10% differences to 5% differences, and we now have complete matches for DG, CA3, CA2, and Sub.

nmsutton commented 4 years ago

@drdiek here is the new CA1: tool_ca1_distinct_5.xlsx

nmsutton commented 4 years ago

@drdiek also, great to hear about the progress!

drdiek commented 4 years ago

@nmsutton Could you please re-run CA1 again for me?

nmsutton commented 4 years ago

@drdiek here is CA1 with the newest tables from Github: tool_ca1_distinct_6.xlsx.

nmsutton commented 4 years ago

@drdiek I made a new version of the tool for you that exports subregions. It is available in the repo under the name connprob_export_subreg.php. Just open the page, select subregion, then press export. If you are still having problems getting the local version of the site to run the page is also availible here: http://www.hippocampome.org/phpdev/connprob_export_subreg.php . Just upload the latest table files to the server and it will automatically use them. I have added the latest table files to the phpdev server currently.

Hopefully this can help with more rapid testing. If the tool's code no longer has any issues this can serve as a way to get the exports. Things I do in post processing are 1.) add prob and noc to the column names 2.) sort by the the source_id and target_id columns in the xlsx. I imagine you can process that when you want as well. I am fine with continuing to provide subregions exports on request if wanted, but I thought this might help as well.

drdiek commented 4 years ago

@nmsutton Does the tool make all of the calculations with unrounded numbers and only rounds at the end, or do the numbers get rounded at intermediate stages? I am finding a discrepancy between my hand calculations and the numbers generated by the tool. The difference is subtle, but it is there.

nmsutton commented 4 years ago

@drdiek the tool makes calculations with unrounded numbers and only rounds at the end. There is no rounding at intermediate stages. The display of NoC and probability is rounded but the calculations are not. Can you please share what the subtle difference is that you are seeing (with an example)?

drdiek commented 4 years ago

@nmsutton It is probably all due to the difference in precision of the server system and Excel on my laptop. The difference is less than 5%, but the rounding is just slightly off, and I thought it might be explained away by rounding errors, but in this case it is most likely system precision.

drdiek commented 4 years ago

@nmsutton I have come across a discrepancy between the tool's values and my hand-calculated values. For CA1 Back-Projection [4023] to CA1 Schaffer Collateral-Associated [4015], here are your values on top and mine below:

SLM | SR | SP | SO | Total | SLM | SR | SP | SO | Total 0.002106 | 0.002889 | 0 | 0 | 0.004989 | 1.23 | 2.99 | 0 | 0 | 4.22 0.002583681 | 0.008537459 | 0 | 0 | 0.01112114 | 1.226882025 | 3.385551588 | 0 | 0 | 4.61243361

As you can see, the biggest differences appear to be in the CA1:SR calculations. Could you please look into this? I can't imagine why this would be happening in this one particular case, when everything else has been aligning quite well.

drdiek commented 4 years ago

@nmsutton A similar problem for CA1 Back-Projection [4023] to CA1 Interneuron Specific LMO-O [4022]:

SLM | SR | SP | SO | Total | SLM | SR | SP | SO | Total 0.00283 | 0 | 0 | 0.002409 | 0.005232 | 1.79 | 0 | 0 | 2.08 | 3.87 0.005068493 | 0 | 0 | 0.002701159 | 0.00776965 | 1.791220372 | 0 | 0 | 1.351074785 | 3.14229516

nmsutton commented 4 years ago

@drdiek I will look into this

nmsutton commented 4 years ago

@drdiek below is some debug code for CA1 Back-Projection to CA1 Schaffer Collateral-Associated. Please help see at what part it appear to be diverging from you calculations and respond what you find, to identify more specifically what many be an issue with the calcuations. Edit: I added NOC formulas now too. NOC Values Note: formula below is (if (noc!=0) otherwise num_contacts[i] = 0) and noc = (4 c length_axons[i] length_dendrites[i]) / (volume_axons[i] + volume_dendrites[i]) SLM noc: 1.226882025526354 c: 4.958615217267108 length_axons[i]: 5721.893247 length_dendrites[i]: 852.9395402 volume_axons[i]: 122008310.2 volume_dendrites[i]: 11164164.19 Formula: 1.226882025526354 = (1 / 2) + (4 4.958615217267108 5721.893247 852.9395402) / (122008310.2 + 11164164.19) SR noc: 2.9909607527181943 c: 4.958615217267108 length_axons[i]: 5993.625567 length_dendrites[i]: 3184.28414 volume_axons[i]: 87479224.38 volume_dendrites[i]: 64489751.69 Formula: 2.9909607527181943 = (1 / 2) + (4 4.958615217267108 5993.625567 3184.28414) / (87479224.38 + 64489751.69) SP noc: 0 c: 4.958615217267108 length_axons[i]: 6404.608473 length_dendrites[i]: 0 volume_axons[i]: 57462603.88 volume_dendrites[i]: 0 Formula: 0 (due to noc = 0) = (1 / 2) + (4 4.958615217267108 6404.608473 0) / (57462603.88 + 0) SO noc: 0 c: 4.958615217267108 length_axons[i]: 8560.665963 length_dendrites[i]: 0 volume_axons[i]: 82903047.09 volume_dendrites[i]: 0 Formula: 0 (due to noc = 0) = (1 / 2) + (4 4.958615217267108 8560.665963 0) / (82903047.09 + 0) Probability Calc Values SLM final_result_val: c: 4.958615217267108 length_axons[i]: 5721.893247 length_dendrites[i]: 852.9395402 volumes_array[i]: 9366545922 num_contacts[i]: 1.226882025526354 Formula: 0.002106 = (4.958615217267108 ((5721.893247 852.9395402) / 9366545922)) / 1.226882025526354 SR final_result_val: c: 4.958615217267108 length_axons[i]: 5993.625567 length_dendrites[i]: 3184.28414 volumes_array[i]: 10951817102 num_contacts[i]: 2.9909607527181943 Formula: 0.002889 = (4.958615217267108 ((5993.625567 3184.28414) / 10951817102)) / 2.9909607527181943 SP final_result_val: c: 4.958615217267108 length_axons[i]: 6404.608473 length_dendrites[i]: 0 volumes_array[i]: 3436800000 num_contacts[i]: 0 Formula: 0.000 = (4.958615217267108 ((6404.608473 0) / 3436800000)) / 0 SO final_result_val: c: 4.958615217267108 length_axons[i]: 8560.665963 length_dendrites[i]: 0 volumes_array[i]: 7686802192 num_contacts[i]: 0 Formula: 0.000 = (4.958615217267108 ((8560.665963 * 0) / 7686802192)) / 0 End Results CA1 Back-Projection,CA1 Schaffer Collateral-Assoc

                SLM,SR,SP,SO,Total
Probabilities: 0.002106,0.002889,0.000,0.000,0.004989
NOC: 1.23,2.99,0.00,0.00,4.22
nmsutton commented 4 years ago

@drdiek below are the values for CA1 Back-Projection to CA1 Interneuron Specific LMO-O (not sure why they are slightly different than your values, mabie just my computer vs. yours). Edit: I looked into the slight difference and found my computer vs. the server computer appears to calculate the values (total prob. 0.005122 (my pc) vs. 0.005232 (server)) slightly differently using the same code. Seems likely to be the reason, this does not explain the larger difference with your hand calculations however. NOC Values Note: formula below is (if (noc!=0) otherwise num_contacts[i] = 0) and noc = (4 c length_axons[i] length_dendrites[i]) / (volume_axons[i] + volume_dendrites[i]) SLM noc: 1.8335522067701928 c: 4.958615217267108 length_axons[i]: 5721.893247 length_dendrites[i]: 1673.23955 volume_axons[i]: 122008310.2 volume_dendrites[i]: 20391119.53 Formula: 1.8335522067701928 = (1 / 2) + (4 4.958615217267108 5721.893247 1673.23955) / (122008310.2 + 20391119.53) SR noc: 0 c: 4.958615217267108 length_axons[i]: 5993.625567 length_dendrites[i]: 0 volume_axons[i]: 87479224.38 volume_dendrites[i]: 0 Formula: 0 (due to noc = 0) = (1 / 2) + (4 4.958615217267108 5993.625567 0) / (87479224.38 + 0) SP noc: 0 c: 4.958615217267108 length_axons[i]: 6404.608473 length_dendrites[i]: 0 volume_axons[i]: 57462603.88 volume_dendrites[i]: 0 Formula: 0 (due to noc = 0) = (1 / 2) + (4 4.958615217267108 6404.608473 0) / (57462603.88 + 0) SO noc: 2.115072956235414 c: 4.958615217267108 length_axons[i]: 8560.665963 length_dendrites[i]: 905.8023223 volume_axons[i]: 82903047.09 volume_dendrites[i]: 12325955.6 Formula: 2.115072956235414 = (1 / 2) + (4 4.958615217267108 8560.665963 905.8023223) / (82903047.09 + 12325955.6) Probability Calc Values SLM final_result_val: c: 4.958615217267108 length_axons[i]: 5721.893247 length_dendrites[i]: 1673.23955 volumes_array[i]: 9366545922 num_contacts[i]: 1.8335522067701928 Formula: 0.002764 = (4.958615217267108 ((5721.893247 1673.23955) / 9366545922)) / 1.8335522067701928 SR final_result_val: c: 4.958615217267108 length_axons[i]: 5993.625567 length_dendrites[i]: 0 volumes_array[i]: 10951817102 num_contacts[i]: 0 Formula: 0.000 = (4.958615217267108 ((5993.625567 0) / 10951817102)) / 0 SP final_result_val: c: 4.958615217267108 length_axons[i]: 6404.608473 length_dendrites[i]: 0 volumes_array[i]: 3436800000 num_contacts[i]: 0 Formula: 0.000 = (4.958615217267108 ((6404.608473 0) / 3436800000)) / 0 SO final_result_val: c: 4.958615217267108 length_axons[i]: 8560.665963 length_dendrites[i]: 905.8023223 volumes_array[i]: 7686802192 num_contacts[i]: 2.115072956235414 Formula: 0.002365 = (4.958615217267108 ((8560.665963 * 905.8023223) / 7686802192)) / 2.115072956235414 End Results CA1 Back-Projection,CA1 Interneuron Specific LMO-O

                SLM,SR,SP,SO,Total
Probabilities: 0.002764,0.000,0.000,0.002365,0.005122
NOC: 1.83,0.00,0.00,2.12,3.95
drdiek commented 4 years ago

@nmsutton Thanks. I managed to find a very subtle type that snuck into one of my cell calculations, which, thankfully, only affects the CA1 BP calculations.

nmsutton commented 4 years ago

@drdiek so the values are fixed now? I uploaded to Github and the server http://www.hippocampome.org/phpdev/connprob_debug.php in case it is useful. It creates the report seen in earlier comments. I also modified lines 167 and 168 in connprob files:

if (isNaN(num_contacts[i])) {num_contacts[i] = 0;}
if (!isFinite(num_contacts[i])) {num_contacts[i] = 0;}

to come before the final_result_val formula that uses num_contacts[i]. I have not observed this creating different results, and I don't know that it does, but conceptually at least this seems better.

drdiek commented 4 years ago

@nmsutton I changed my parameters so that differences need to be within 2.5%, and several differences appeared in CA2. I am having trouble with all of the cases with presynaptic CA1 Bistratified. For example with postsynaptic CA2 Pyramidal, I believe there is a summing error concerning the probabilities. According to your debug output the result is:

Probabilities: 0.000,0.03130,0.02419,0.02851,0.08168

Please take a look at the numbers yourself, as the sum of 0.03130+0.02419+0.02851 is 0.084 and not 0.08168. Something screwy is going on.

nmsutton commented 4 years ago

@drdiek I will look into this. Btw, I think you mean CA2 Bistratified-CA2 Pyramidal? No issues with CA1 Bistratified, right?

nmsutton commented 4 years ago

@drdiek the difference with CA2 Bistratified-CA2 Pyramidal is that the probability total is not the sum of all probabilities. In the email with the subject "Connection Probabilities Tool" from Apr. 2cnd. Giorgio stated he wanted to use the formula 1-((1-Px)*(1-Py)*...) for the total probability. Therefore, 1-((1-0.03130)*(1-0.02419)*(1-0.02851)) = 0.081682419. Does this help clarify your question?

Please tell me, should that formula be listed somewhere on the page to not confuse users in the future?

drdiek commented 4 years ago

@nmsutton Thanks. Unfortunately, this means that a lot of my probability totals are off slightly. Fortunately, they are within 2.5%, so I am not going to overly sweat them.

drdiek commented 4 years ago

@nmsutton Giorgio wants us to update the formulas in Carolina's spreadsheet, so we are not quite done yet, at least on that end. I am quite confident now in the tool's output.